# Basic Pandas Homework

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## DatFrame and Series

Create a `DataFrame` named `df1` with the following properties:

* 10 rows.
* An `age` column with random ages between 0 and 100 (inclusive).
* A `cell_phone` column of randomly sampled categorical values `ios`, `android`, `windows`.
* A `gender` column of randomly sampled categorical values `f` and `m`.
* The order of columns should be `gender`, `age`, `cell_phone`.
* A row index consisting of lowercase alphabetical letters.

Use the function defined above to create this `DataFrame`.

In [2]:
rows = 10
d = {
    "gender": np.random.choice(['m', 'f'], 10),
    "age": np.random.random_integers(0, 100, 10),
    "cell_phone": np.random.choice(['ios', 'android', 'windows'], 10)
}

df1 = pd.DataFrame(d, index=list(map(chr, range(97, 107))), columns=['gender', 'age', 'cell_phone'])
df1['cell_phone'] = df1['cell_phone'].astype('category')
df1['gender'] = df1['gender'].astype('category')

In [3]:
df1

Unnamed: 0,gender,age,cell_phone
a,m,18,android
b,m,32,windows
c,m,61,ios
d,f,83,android
e,m,45,windows
f,m,56,ios
g,f,35,windows
h,m,68,ios
i,m,32,ios
j,f,4,windows


Make sure your code passes the following `assert` statements:

In [4]:
assert 'a' in df1.index
assert 'j' in df1.index
assert 'age' in df1.columns
assert 'cell_phone' in df1.columns
assert 'gender' in df1.columns
assert df1.age.dtype.name=='int64'
assert df1.cell_phone.dtype.name=='category'
assert df1.gender.dtype.name=='category'
assert list(df1.columns)==['gender','age','cell_phone']

Create a new `DataFrame`, named `df2` with the following transformations:

* Extract rows `a` through `g`
* Extract the `age` and `gender` column, but put `age` first.
* Reverse the rows to they run `g` to `a`.
* Add a new column named `income` that is a random list of dollars between [0,10000]
* Add a new column named `expenses` that is a random list of dollars between [0,10000]
* Create a new column named `profit` that is computed using the `income` and `expenses` columns.

In [5]:
df2 = df1[:]['a':'g'].iloc[::-1]
df2 = df2[['age', 'gender']]
df2['income'] = np.random.random_integers(0, 10000, 7)
df2['expenses'] = np.random.random_integers(0, 10000, 7)
df2['profit'] = df2['income'] - df2['expenses']

In [6]:
df2

Unnamed: 0,age,gender,income,expenses,profit
g,35,f,3908,5198,-1290
f,56,m,7925,9473,-1548
e,45,m,5819,2743,3076
d,83,f,2399,572,1827
c,61,m,211,9262,-9051
b,32,m,4487,7100,-2613
a,18,m,9983,4687,5296


Make sure your code passes the following `assert` statements:

In [7]:
assert df2.index[0]=='g'
assert df2.index[-1]=='a'
assert list(df2.columns)==['age','gender','income','expenses','profit']
assert df2.income.dtype.name=='int64'
assert df2.expenses.dtype.name=='int64'
assert df2.expenses.dtype.name=='int64'
assert all(df2.profit+df2.expenses-df2.income==0)

Write a function named `mean_and_std` that takes a `Series` object and returns a new `Series` that contains the mean and standard deviation of the original series.

In [8]:
def mean_and_std(s):
    """Compute the mean and std of series s.
    
    Parameters
    ----------
    s : Series
    
    Returns
    -------
    Series containing the mean and std of s with an index of `mean` and `std`.
    """
    return s.describe()[['mean', 'std']]

Make sure your code passes the following `assert` statements:

In [9]:
assert list(mean_and_std(df2.age).index)==['mean','std']
assert mean_and_std(df2.age)['mean']==df2.age.describe()['mean']
assert mean_and_std(df2.age)['std']==df2.age.describe()['std']

Use the `.apply()` method with `mean_and_std` to compute the mean and standard deviation of only the columns `income`, `expenses` and profit. Save the result in a variable named `stats`.

In [10]:
stats = df2[['income', 'expenses', 'profit']].apply(mean_and_std)

In [11]:
stats

Unnamed: 0,income,expenses,profit
mean,4961.714286,5576.428571,-614.714286
std,3296.507481,3293.804531,4672.811064


Make sure your code passes the following `assert` statements:

In [12]:
d = df2[['income','expenses','profit']].describe()
assert 'income' in stats.columns
assert 'expenses' in stats.columns
assert 'profit' in stats.columns
assert stats.ix['mean','income']==d.ix['mean','income']
assert stats.ix['mean','expenses']==d.ix['mean','expenses']
assert stats.ix['mean','profit']==d.ix['mean','profit']
assert stats.ix['std','income']==d.ix['std','income']
assert stats.ix['std','expenses']==d.ix['std','expenses']
assert stats.ix['std','profit']==d.ix['std','profit']