### Learning Pandas and Numpy

This notebook contains some tips and trick that I learnt during my time working with pandas and numpy!

In this tutorial, I generated a toy dataset which is a dataset containing several people with their summer activities

In [5]:
import pandas as pd
import numpy as np
import random
import time

In [6]:
def strTimeProp(start, end, format, prop):
    """Get a time at a proportion of a range of two formatted times.

    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """

    stime = time.mktime(time.strptime(start, format))
    etime = time.mktime(time.strptime(end, format))

    ptime = stime + prop * (etime - stime)

    return time.strftime(format, time.localtime(ptime))


def randomDate(start, end, prop):
    return strTimeProp(start, end, '%m/%d/%Y %I:%M %p', prop)

In [11]:
time.strftime(randomDate("07/08/2017 08:30 AM", "07/08/2017 5:50 AM", random.random()))

'07/08/2017 08:01 AM'

In [18]:
def generate_dummy_data():
    persons = ['Thomas Col', 'Mitbal Kurzweil', 'Addy Bangoen', 
                   'Aqur Ruqi', 'Jane Ziporag', 'Raphael Varine', 'Benitez Alpha', 'Suzan Mort']
    activities = ['Football', 'Travel', 'Beach', 'Animes Marathon', 'Basketball', 'Singing']
    
    data = []
    for i in range(50):
        data.append({
                'name': random.choice(persons),
                'activity': random.choice(activities),
                'timestamp': time.strftime(randomDate("07/08/2017 08:30 AM", 
                                                      "07/08/2017 5:50 AM", random.random()))
            })
    df = pd.DataFrame(data)
    return df

In [19]:
df = generate_dummy_data()

In [21]:
df.head()

Unnamed: 0,activity,name,timestamp
0,Basketball,Aqur Ruqi,07/08/2017 07:17 AM
1,Travel,Raphael Varine,07/08/2017 06:09 AM
2,Football,Raphael Varine,07/08/2017 07:31 AM
3,Travel,Mitbal Kurzweil,07/08/2017 07:19 AM
4,Travel,Jane Ziporag,07/08/2017 07:09 AM


 ### 1. String Command
 
 For string manipulations, it is most recommended to use the Pandas **Ufuncs**, which is **.str** command
 
 For example, we will split a name column which includes the full name into two separate columns, **first_name** and **last_name** 

In [30]:
df[['first_name', 'last_name']] = df.name.str.split(" ", expand=True)

In [31]:
df.head()

Unnamed: 0,activity,name,timestamp,first_name,last_name
0,Basketball,Aqur Ruqi,07/08/2017 07:17 AM,Aqur,Ruqi
1,Travel,Raphael Varine,07/08/2017 06:09 AM,Raphael,Varine
2,Football,Raphael Varine,07/08/2017 07:31 AM,Raphael,Varine
3,Travel,Mitbal Kurzweil,07/08/2017 07:19 AM,Mitbal,Kurzweil
4,Travel,Jane Ziporag,07/08/2017 07:09 AM,Jane,Ziporag


 ### 2. Group by and value_counts
 
 we can group by one column and count the values of another column per this column value using **value_counts**.
 
 on this dataset, we can count the number of activities each person did.

In [32]:
df.groupby('name')['activity'].value_counts()

name             activity       
Addy Bangoen     Football           1
                 Singing            1
Aqur Ruqi        Singing            4
                 Animes Marathon    2
                 Basketball         1
                 Football           1
Benitez Alpha    Basketball         3
                 Football           1
                 Travel             1
Jane Ziporag     Beach              1
                 Football           1
                 Travel             1
Mitbal Kurzweil  Travel             4
                 Animes Marathon    1
                 Basketball         1
                 Football           1
Raphael Varine   Football           3
                 Animes Marathon    1
                 Basketball         1
                 Travel             1
Suzan Mort       Beach              4
                 Animes Marathon    3
                 Basketball         2
                 Football           2
                 Singing            1
Thomas Col       