# Setup Test

The goal of this notebook is to make sure that your local setup works and that you can successfully connect to the autograding server that we provide for later assignments.

### Before you begin - automark

To check whether the code you've written is correct, we'll use **automark**. For this, we created for each of you an account with the username being your student number. 

In [None]:
import automark as am

# fill in you student number as your username
am.configure(username='13646168') # Test 1

# to check your progress, you can run this function
am.get_progress()

### SQL with DuckDB - Setup test

First, we test whether the embedded database [DuckDB](https://duckdb.org/) works on your machine, which lets us write SQL queries.


In [None]:
import duckdb
import pandas as pd

sailors_data = {
    'sid': [1, 2, 3],
    'sname': ["Fred", "Nancy", "Ji"],
    'experience': [7, 2, 8],
    'age': [22, 39, 27]
}

sailors = pd.DataFrame.from_dict(sailors_data)

sailors

The following helper function allows us to run queries on the database:

In [None]:
def execute_local(query):
    con = duckdb.connect(database=':memory:', read_only=False)
    con.register('sailors', sailors)

    result = con.execute(query).fetchdf()
    
    return result

#### Test task

In this test task, we compute the average experience of sailors that are less than 35 years old. Copy the following SQL query into the appropriate location in the `a0_t1_sailor_avg_experience` function below.

`SELECT AVG(experience) FROM sailors WHERE age < 35;`

In [None]:
def a0_t1_sailor_avg_experience():
    query = '''
    REPLACE_THIS_TEXT_WITH_THE_SQL_QUERY    
    '''

    return query

Now you can test the query on the local `sailors` data via the helper function. This should return a single tuple with an attribute `avg(experience)` and a value of 7.5.

In [None]:
execute_local(a0_t1_sailor_avg_experience())

Finally, you can have the autograding server test your function by executing the following cell.

In [None]:
am.test_student_function(a0_t1_sailor_avg_experience)

### Dataflows with PySpark - Setup Test 

Next, we test whether you can run [PySpark](https://spark.apache.org/docs/latest/api/python/) programs locally on your computer. For that, we setup a local Pyspark session first.

Note that you can ignore the following warnings that might occur when executing the next cell:

```
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/spark-3.2.0-bin-hadoop3.2/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
```

and

```
 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
```

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql import Row

spark = SparkSession.builder \
    .master("local") \
    .config("spark.driver.bindAddress", "127.0.0.1") \
    .getOrCreate()

Next, we turn our sailors data into a PySpark dataframe.

In [None]:
sailors_df = spark.createDataFrame(sailors)

#### Test task

In this test task, we again compute the average experience of sailors that are less than 35 years old. Copy the following pyspark code into the appropriate location in the `a0_t2_sailor_avg_experience_pyspark` function below.

`return sailors_data.filter(sailors_data['age'] < 35).agg({"experience": "avg"})`

In [None]:
def a0_t2_sailor_avg_experience_pyspark(sailors_data):
    # REPLACE THE LINE BELOW WITH THE PYSPARK CODE.
    return sailors_data

Now you can test the spark program on the local `sailors_df` data. This should again return a single tuple with an attribute `avg(experience)` and a value of 7.5.

In [None]:
result = a0_t2_sailor_avg_experience_pyspark(sailors_df)
result.toPandas()

Finally, you can have the autograding server test your function by executing the following cell.

In [None]:
am.test_student_function(a0_t2_sailor_avg_experience_pyspark)