# Pandas - Italian poets challenge

## [Download exercises](../_static/generated/pandas.zip)


For a digital humanities project you need to display poets by filtering a csv table according to various criteria. This challenge will be only about querying with pandas, which is something you might find convenient to do during exams for quickly understanding datasets content (using pandas will always be optional, you will never be asked to perform complex modifications with it)

You are given a dataset taken from [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), a project by the Wikimedia foundation which aims to store only machine-readable data, like numbers, strings, and so on interlinked with many references. Each entity in Wikidata has an identifier, for example Dante Alighieri is the entity [Q1067](https://www.wikidata.org/wiki/Q1067) and Florence is [Q2044](https://www.wikidata.org/wiki/2044)

Wikidata can be queried using the SPARQL language: the data was obtained with [this query](https://query.wikidata.org/#%23defaultView%3AMap%7B%22hide%22%3A%20%5B%22%3Fcoord%22%5D%7D%0ASELECT%20%3Fsubj%20%3FsubjLabel%20%3Fplace%20%3FplaceLabel%20%3Fcoord%20%3Fbirthyear%0AWHERE%20%7B%0A%20%20%20%3Fsubj%20wdt%3AP106%20wd%3AQ49757%20.%0A%20%20%20%3Fsubj%20wdt%3AP19%20%3Fplace%20.%0A%20%20%20%3Fplace%20wdt%3AP17%20wd%3AQ38%20.%0A%20%20%20%3Fplace%20wdt%3AP625%20%3Fcoord%20.%0A%20%20%20OPTIONAL%20%7B%20%3Fsubj%20wdt%3AP569%20%3Fdob%20%7D%0A%20%20%20BIND%28YEAR%28%3Fdob%29%20as%20%3Fbirthyear%29%0ASERVICE%20wikibase%3Alabel%20%7B%20%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22%20%7D%0A%7D) and downloaded in CSV format (among the many which can be chosen). Even if not necessary for the purposes of the exercise, you are invited to play a bit with the interface, like trying different visualizations (i.e. try select map in the middle-left corner) - or see [other examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples)

### What to do

1. If you haven't already, install Pandas:

    Anaconda:

    `conda install pandas`

    Without Anaconda (`--user` installs in your home):

    `python3 -m pip install --user pandas`


2. unzip exercises in a folder, you should get something like this: 

```
 pandas     
     eures-jobs.ipynb
     eures-jobs-sol.ipynb     
     italian-poets-chal.ipynb
     pandas.ipynb     
     pandas-sol.ipynb     
     jupman.py
```

<div class="alert alert-warning">

**WARNING 1**: to correctly visualize the notebook, it MUST be in an unzipped folder !
</div>


3. open Jupyter Notebook from that folder. Two things should open, first a console and then browser. 
4. The browser should show a file list: navigate the list and open the notebook `pandas/italian-poets-chal.ipynb`

<div class="alert alert-warning">

**WARNING 2**: DO NOT use the _Upload_ button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
</div>

5. Go on reading that notebook, and follow instuctions inside.


Shortcut keys:

- to execute Python code inside a Jupyter cell, press `Control + Enter`
- to execute Python code inside a Jupyter cell AND select next cell, press `Shift + Enter`
- to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press `Alt + Enter`
- If the notebooks look stuck, try to select `Kernel -> Restart`

## Load the dataset

In [1]:
import pandas as pd   # we import pandas and for ease we rename it to 'pd'
import numpy as np    # we import numpy and for ease we rename it to 'np'

df = pd.read_csv('italian-poets.csv', encoding='UTF-8')

## Tell me more

Show some info about the dataset

In [12]:
# write here


## Getting in shape

Show the rows and the columns counts:

In [13]:
# write here


## 10 rows 

Display first 10 rows

In [14]:
# write here


## Born in Verona

Display all people born in Verona

In [15]:
# write here


## How many  people in Verona

Display how many people were born in Verona

In [16]:
# write here


## Python is everywhere

Show poets born in Catania in the year -500 (I swear I did not altered the dataset in any way :-)

In [17]:
# write here


## Verona after 1500

Display all people born in Verona after the year 1500

In [18]:
# write here


## First Antonio

Display all people with Antonio as first name

In [19]:
# write here


## Some Antonio

Display all people with Antonio as one of the names (so also include `'Paolo Antonio Rolli'`)

In [20]:
# write here


## Cesares during 1800

Display all people named Cesare who were born in 1800 century

In [21]:
# write here


## Sorting

Show poets in year of birth order

In [22]:
# write here


## Where poets are born

Find the 5 cities with most poets, sorted from most to least.

* use `groupby` and `sort_values` methods

In [23]:
# write here


## Duplicated poets

Find first 10 duplicated poets

In [24]:
# write here
