# Pandas Objects

In [1]:
import pandas as pd
import numpy as np

## Series

Create a `Series` named `s1` with the following properties:

* 7 values that are the first 7 lowercase letters.
* An index of the days of the week capitalized, starting with `Sunday`.

In [2]:
import calendar
calendar.setfirstweekday(calendar.SUNDAY)

In [3]:
for day in calendar.day_name:
    print(day)

Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday


In [4]:
lowercaseletters=[i for i in 'abcdefg']
days=[day for day in calendar.day_name]
days.insert(0,days[6])
days=days[:-1]
s1=pd.Series(data=lowercaseletters,index=days)

In [5]:
v = s1.values
for char in 'abcdefg':
    assert char in v
assert 'Sunday' in s1
assert 'Monday' in s1
assert 'Tuesday' in s1
assert 'Wednesday' in s1
assert 'Thursday' in s1
assert 'Friday' in s1
assert 'Saturday' in s1
assert s1.iloc[0] == 'a'
assert s1.index[0] == 'Sunday'

Use the `.loc` indexer to slice the `s1` `Series` by index values to create a new `Series` named `s2` with only the weekdays.

In [6]:
s2=s1.loc['Monday':'Friday']

In [7]:
s2

Monday       b
Tuesday      c
Wednesday    d
Thursday     e
Friday       f
dtype: object

In [8]:
v = s2.values
for char in 'bcdef':
    assert char in v
assert 'Monday' in s2
assert 'Tuesday' in s2
assert 'Wednesday' in s2
assert 'Thursday' in s2
assert 'Friday' in s2

Use the `.iloc` indexer to slice the `s1` `Series` in a manner that reverses its values/index. Name the new `Series` `s3`.

In [9]:
s3=s1.iloc[-1::-1]

In [10]:
s3 

Saturday     g
Friday       f
Thursday     e
Wednesday    d
Tuesday      c
Monday       b
Sunday       a
dtype: object

In [11]:
assert ''.join(s3.values)=='gfedcba'
assert list(s3.index)==list(reversed(s1.index))

## DataFrame

Create a `DataFrame` named `df1` with the following properties:

* 10 rows.
* An `age` column with random ages between 0 and 100 (inclusive).
* A `cell_phone` column of randomly sampled categorical values `ios`, `android`, `windows`.
* A `gender` column of randomly sampled categorical values `f` and `m`.
* The order of columns should be `gender`, `age`, `cell_phone`.
* A row index consisting of lowercase alphabetical letters.

In [12]:
lowercaseletters = [i for i in 'abcdefghij']
gender = pd.Series(np.random.choice(['m', 'f'], 10), dtype='category', index=lowercaseletters)
age = pd.Series(np.random.randint(0, 101, 10), index=lowercaseletters)
cell_phone = pd.Series(np.random.choice(["ios", "android", "windows"], 10), dtype='category', index=lowercaseletters)

In [13]:
data={'gender':gender,'age':age,'cell_phone':cell_phone}
df1=pd.DataFrame(data,columns=['gender','age','cell_phone'],index=lowercaseletters)

In [14]:
df1

Unnamed: 0,gender,age,cell_phone
a,m,14,ios
b,f,4,ios
c,f,27,android
d,m,71,ios
e,f,70,android
f,m,86,ios
g,f,18,windows
h,f,26,android
i,m,94,android
j,m,67,ios


In [15]:
for char in 'abcdefghij':
    assert char in df1.index
assert 'age' in df1.columns
assert 'cell_phone' in df1.columns
assert 'gender' in df1.columns
assert df1.age.dtype.name=='int64'
assert df1.cell_phone.dtype.name=='category'
assert df1.gender.dtype.name=='category'
assert set(df1.gender.unique())=={'f','m'}
assert set(df1.cell_phone.unique())=={'windows','android','ios'}
assert list(df1.columns)==['gender','age','cell_phone']

Create a new `DataFrame`, named `df2` with the following transformations:

* Extract rows `a` through `g`
* Extract the `age` and `gender` column, but put `age` first.
* Reverse the rows to they run `g` to `a`.
* Add a new column named `income` that is a random list of dollars between [0,10000]
* Add a new column named `expenses` that is a random list of dollars between [0,10000]
* Create a new column named `profit` that is computed using the `income` and `expenses` columns.

In [16]:
df2 = pd.DataFrame(df1.loc['a':'g'])
df2.drop('cell_phone', axis=1, inplace=True)
df2 = pd.DataFrame(df2, columns=['age', 'gender'])
df2 = pd.DataFrame(df2[-1::-1])
df2['income'] = np.random.randint(0, 10001, 7)
df2['expenses'] = np.random.randint(0, 10001, 7)
df2['profit'] = df2['income'] - df2['expenses']

In [17]:
df2

Unnamed: 0,age,gender,income,expenses,profit
g,18,f,2834,805,2029
f,86,m,7670,1062,6608
e,70,f,1018,5376,-4358
d,71,m,1077,3285,-2208
c,27,f,2841,5206,-2365
b,4,f,8985,3258,5727
a,14,m,9431,9188,243


Make sure your code passes the following `assert` statements:

In [18]:
assert df2.index[0]=='g'
assert df2.index[-1]=='a'
assert list(df2.columns)==['age','gender','income','expenses','profit']
assert df2.income.dtype.name=='int64'
assert df2.expenses.dtype.name=='int64'
assert df2.expenses.dtype.name=='int64'
assert all(df2.profit+df2.expenses-df2.income==0)

Using row filtering, column selection, and the `.mean()` method, calculate and print the average age for men and women in the `df1` `DataFrame`:

In [19]:
f = df1[df1.gender == 'f']
print(np.mean(f.age))

29.0


In [20]:
m=df1[df1.gender=='m']
print(np.mean(m.age))

66.4


Use the `iloc` indexer on `df1` to extract every other row and the last column. Save the resulting `Series` as `s4`:

In [21]:
s4=df1.iloc[0::2]['cell_phone']

In [22]:
assert len(s4)==5
assert list(s4.index)==list('acegi')
assert s4.name=='cell_phone'

Use the `.loc` indexer to extract all rows and just the `gender` and `age` columns. Save the resulting `DataFrame` as `df3`.

In [23]:
df3=df1.loc[:,['gender','age']]

In [24]:
df3

Unnamed: 0,gender,age
a,m,14
b,f,4
c,f,27
d,m,71
e,f,70
f,m,86
g,f,18
h,f,26
i,m,94
j,m,67


In [25]:
assert list(df3.columns)==['gender','age']
assert len(df3)==10
assert list(df3.index)==list('abcdefghij')