# Pandas Objects

In [1]:
import pandas as pd
import numpy as np

## Series

Create a `Series` named `s1` with the following properties:

* 7 values that are the first 7 lowercase letters.
* An index of the days of the week capitalized, starting with `Sunday`.

In [2]:
s1 = pd.Series(['a', 'b', 'c', 'd', 'e', 'f', 'g'],
              index = ['Sunday', 'Monday', 'Tuesday', "Wednesday", 'Thursday', 'Friday', 'Saturday'])

In [3]:
s1

Sunday       a
Monday       b
Tuesday      c
Wednesday    d
Thursday     e
Friday       f
Saturday     g
dtype: object

In [4]:
v = s1.values
for char in 'abcdefg':
    assert char in v
assert 'Sunday' in s1
assert 'Monday' in s1
assert 'Tuesday' in s1
assert 'Wednesday' in s1
assert 'Thursday' in s1
assert 'Friday' in s1
assert 'Saturday' in s1
assert s1.iloc[0] == 'a'
assert s1.index[0] == 'Sunday'

Use the `.loc` indexer to slice the `s1` `Series` by index values to create a new `Series` named `s2` with only the weekdays.

In [5]:
s2 = s1.loc['Monday':'Friday']

In [6]:
s2

Monday       b
Tuesday      c
Wednesday    d
Thursday     e
Friday       f
dtype: object

In [7]:
v = s2.values
for char in 'bcdef':
    assert char in v
assert 'Monday' in s2
assert 'Tuesday' in s2
assert 'Wednesday' in s2
assert 'Thursday' in s2
assert 'Friday' in s2

Use the `.iloc` indexer to slice the `s1` `Series` in a manner that reverses its values/index. Name the new `Series` `s3`.

In [8]:
s3 = s1.iloc[::-1]

In [9]:
s3 

Saturday     g
Friday       f
Thursday     e
Wednesday    d
Tuesday      c
Monday       b
Sunday       a
dtype: object

In [10]:
assert ''.join(s3.values)=='gfedcba'
assert list(s3.index)==list(reversed(s1.index))

## DataFrame

Create a `DataFrame` named `df1` with the following properties:

* 10 rows.
* An `age` column with random ages between 0 and 100 (inclusive).
* A `cell_phone` column of randomly sampled categorical values `ios`, `android`, `windows`.
* A `gender` column of randomly sampled categorical values `f` and `m`.
* The order of columns should be `gender`, `age`, `cell_phone`.
* A row index consisting of lowercase alphabetical letters.

In [11]:
df1 = pd.DataFrame(
    {'gender' : np.random.choice(['f', 'm'], size = 10), 
     'age' : np.random.randint(0, high = 101, size = 10),
     'cell_phone' : np.random.choice(['ios', 'android', 'windows'], size = 10)
    }, columns = ['gender', 'age', 'cell_phone'], index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
df1['gender'] = df1['gender'].astype('category')
df1['cell_phone'] = df1['cell_phone'].astype('category')

In [12]:
df1

Unnamed: 0,gender,age,cell_phone
a,f,98,windows
b,m,4,ios
c,f,21,ios
d,m,33,ios
e,m,91,ios
f,f,81,android
g,m,60,android
h,f,14,windows
i,m,36,android
j,f,68,ios


In [13]:
for char in 'abcdefghij':
    assert char in df1.index
assert 'age' in df1.columns
assert 'cell_phone' in df1.columns
assert 'gender' in df1.columns
assert df1.age.dtype.name=='int64'
assert df1.cell_phone.dtype.name=='category'
assert df1.gender.dtype.name=='category'
assert set(df1.gender.unique())=={'f','m'}
assert set(df1.cell_phone.unique())=={'windows','android','ios'}
assert list(df1.columns)==['gender','age','cell_phone']

Create a new `DataFrame`, named `df2` with the following transformations:

* Extract rows `a` through `g`
* Extract the `age` and `gender` column, but put `age` first.
* Reverse the rows to they run `g` to `a`.
* Add a new column named `income` that is a random list of dollars between [0,10000]
* Add a new column named `expenses` that is a random list of dollars between [0,10000]
* Create a new column named `profit` that is computed using the `income` and `expenses` columns.

In [14]:
df2 = df1.loc['a':'g', ['age', 'gender']]
df2 = df2.iloc[::-1]
df2['income'] = pd.Series(np.random.randint(0, 10001, size = len(df2.age)), index = df2.index)
df2['expenses'] = pd.Series(np.random.randint(0, 10001, size = len(df2.age)), index = df2.index)
df2['profit'] = df2['income'] - df2['expenses']

In [15]:
df2

Unnamed: 0,age,gender,income,expenses,profit
g,60,m,2481,3341,-860
f,81,f,8177,6108,2069
e,91,m,7658,1759,5899
d,33,m,4778,3219,1559
c,21,f,6651,8530,-1879
b,4,m,6901,179,6722
a,98,f,279,4947,-4668


Make sure your code passes the following `assert` statements:

In [16]:
assert df2.index[0]=='g'
assert df2.index[-1]=='a'
assert list(df2.columns)==['age','gender','income','expenses','profit']
assert df2.income.dtype.name=='int64'
assert df2.expenses.dtype.name=='int64'
assert df2.expenses.dtype.name=='int64'
assert all(df2.profit+df2.expenses-df2.income==0)

Using row filtering, column selection, and the `.mean()` method, calculate and print the average age for men and women in the `df1` `DataFrame`:

In [17]:
manAge = df1[df1['gender'] == 'm']['age'].mean()
print("average male age: " + str(manAge))
womanAge = df1[df1['gender'] == 'f']['age'].mean()
print("average female age: " + str(womanAge))

average male age: 44.8
average female age: 56.4


Use the `iloc` indexer on `df1` to extract every other row and the last column. Save the resulting `Series` as `s4`:

In [18]:
s4 = df1.iloc[::2, -1]

In [19]:
assert len(s4)==5
assert list(s4.index)==list('acegi')
assert s4.name=='cell_phone'

Use the `.loc` indexer to extract all rows and just the `gender` and `age` columns. Save the resulting `DataFrame` as `df3`.

In [20]:
df3 = df1.loc[:, ['gender', 'age']]

In [21]:
df3

Unnamed: 0,gender,age
a,f,98
b,m,4
c,f,21
d,m,33
e,m,91
f,f,81
g,m,60
h,f,14
i,m,36
j,f,68


In [22]:
assert list(df3.columns)==['gender','age']
assert len(df3)==10
assert list(df3.index)==list('abcdefghij')