Pre-requirements! Select some users and put
them into a CSV in `./demo/users.csv`. Include column names!

```sql
select id, name, surname, signupDate
from User
limit 1000000;
```

Effortless shell integration:

In [None]:
!head -n 5 ../data/users.csv

We can read from CSV in one line:

In [None]:
import pandas as pd
df = pd.read_csv('../data/users.csv',
				 parse_dates=['signupDate'])
df.describe()

Many ways to look inside:

In [None]:
df.sample(5)

In [None]:
df.head(2)

In [None]:
df.tail(3)

How many people have both names set?

In [None]:
df.dropna().count()


How many full namesakes live in Yola?

In [None]:
namesakes = df.value_counts(['name', 'surname'])
namesakes.head(10)

In [None]:
# the same in more manual way(for self-testing purposes)

df_copy = df.copy()
df_copy['cnt'] = 1
df_copy.groupby(['name', 'surname']).sum('cnt').sort_values('cnt', ascending=False)

In [None]:
namesakes.iloc[10:20]

Are there any Anton's? Please export them as Python dictionaries.

In [None]:
df[df['name'] == 'Anton'].sample(3).to_dict(orient='records')

When were the happiest days in Yola?

In [None]:
signup_by_day = df.apply(
	lambda row: row.signupDate.date(),
	axis='columns').value_counts()
signup_by_day.head(5)

In [None]:
# Same but fancy

import swifter  # noqa
signup_by_day = df.swifter.apply(  # noqa
	lambda row: row.signupDate.date(),
	axis='columns').value_counts()

In [None]:
## This is done to embed plots into notebooks

import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
label_settings = dict(fontsize='xx-large')
plt.figure(figsize=(30, 20))
plt.rc('font', **{'size': 32})
plt.xlabel('Day', **label_settings)
plt.ylabel('User Signup Count', **label_settings)
signup_by_day.plot()

Let's analyse our users by gender

In [None]:
gender_data_url = 'https://raw.githubusercontent.com/OpenGenderTracking/globalnamedata/master/assets/usprocessed.csv'
gender_data = pd.read_csv(gender_data_url).set_index('Name')
gender_data.sample(10)

In [None]:
df['name'] = df['name'].str.capitalize()
df.sample(5)

In [None]:
merged_df = gender_data.join(df.set_index('name'), how='outer')
merged_df.sample(5)

In [None]:
merged_df.value_counts('prob.gender')
