# Demo


This shows using Python to read data from the GitHub search
 API and writing it to the blob storage backed mount point



In [1]:
import json
import requests

req = requests.get("https://api.github.com/search/repositories?q=tetris&sort=stars&order=desc")

if not req.ok:
    raise "Unable to download data"

with open("/media/polydata/data/github-tetris.json", "w") as f:
    json.dump(req.json(), f)

Next we take the downloaded data back from the file system, read it into a Spark DataFrame, and perform the following:* Explode out the repo data
* Filter to only the few columns we're interested in
* Cast the numeric and timestamp values into their correct data type

*Note:* Polynote has already created the SparkSession as `spark`, and has imported the functions and implicits.



In [2]:
import org.apache.spark.sql.types.{TimestampType, IntegerType}

val df = spark.read.json("/media/polydata/data/github-tetris.json")
    .withColumn("items", explode($"items"))
    .select("items.*")
    .select("full_name", "description", "fork", "language", "open_issues", "updated_at", "watchers")
    .withColumn("open_issues", $"open_issues".cast(IntegerType))
    .withColumn("watchers", $"watchers".cast(IntegerType))
    .withColumn("updated_at", $"updated_at".cast(TimestampType))

Now we perform a simple aggregation to get a view of the number of projects by language type, and the number of projects, issues, and watchers for each language.

In [4]:
df.groupBy($"language")
    .agg(
        count($"full_name").alias("num_projects"), 
        sum($"open_issues").alias("total_open_issues"), 
        sum("watchers").alias("total_watchers"))
    .orderBy($"total_watchers".desc)
    .show()

+----------+------------+-----------------+--------------+
|  language|num_projects|total_open_issues|total_watchers|
+----------+------------+-----------------+--------------+
|JavaScript|          10|               56|         13272|
|     Swift|           2|                5|          1633|
|    Python|           4|                3|          1286|
|       C++|           3|               10|          1271|
|  Assembly|           2|                2|           890|
|      Dart|           1|                0|           839|
|   Clojure|           2|               17|           581|
|   Haskell|           1|                0|           422|
|       Lua|           1|                2|           413|
|      HTML|           1|               17|           388|
|     Shell|           1|                2|           344|
|      Java|           1|                0|           326|
|        Go|           1|                0|           217|
+----------+------------+-----------------+-------------