#Subway Data
Analyzing Subway and Weather Data

Questions:
    - What variables are related to subway ridership?
        - Which stations have the most riders?
        - What are the ridership patterns over time?
        - How does the weather affect ridership?
    - What patterns can I find in the weather?
        - Is the temperature rising throughout the month?
        - How does weather vary across the city?
        
#Two-Dimensional Data
**Python: List of Lists**, **Numpy: 2D Arrays**, **Pandas: DataFrame**

2D Arrays, as opposed to array of array:
    - More memory efficient
    - Accessing elements is a bit different: a[1,3]
    - mean(), std(), etc. operates on entire array

In [1]:
import numpy as np

ridership = np.array([
    [   0,    0,    2,    5,    0],
    [1478, 3877, 3674, 2328, 2539],
    [1613, 4088, 3991, 6461, 2691],
    [1560, 3392, 3826, 4787, 2613],
    [1608, 4802, 3932, 4477, 2705],
    [1576, 3933, 3909, 4979, 2685],
    [  95,  229,  255,  496,  201],
    [   2,    0,    1,   27,    0],
    [1438, 3785, 3589, 4174, 2215],
    [1342, 4043, 4009, 4665, 3033]
])

In [2]:
print(ridership[1, 3])
print(ridership[1:3, 3:5])
print(ridership[1, :])

2328
[[2328 2539]
 [6461 2691]]
[1478 3877 3674 2328 2539]


In [3]:
print(ridership[0, :] + ridership[1, :])
print(ridership[:, 0] + ridership[:, 1])

[1478 3877 3676 2333 2539]
[   0 5355 5701 4952 6410 5509  324    2 5223 5385]


In [4]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
b = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
print(a + b)

[[ 2  3  4]
 [ 6  7  8]
 [10 11 12]]


Writing a function to:
    1. find the max riders on the first day
    2. find the mean riders per day

In [5]:
def mean_riders_for_max_station(ridership):
   
    max_station = ridership[0,:].argmax()
    overall_mean = ridership.mean()
    mean_for_max = ridership[:,max_station].mean()
    
    return (overall_mean, mean_for_max)

In [6]:
mean_riders_for_max_station(ridership)

(2342.5999999999999, 3239.9000000000001)

#Numpy Axis
Operations along an Axis

In [7]:
a = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print(a.sum())
print(a.sum(axis=0))
print(a.sum(axis=1))

45
[12 15 18]
[ 6 15 24]


Finding the mean ridership per day each subway station. Returning the maximum and minimum ridership per day

In [8]:
ridership = np.array([
    [   0,    0,    2,    5,    0],
    [1478, 3877, 3674, 2328, 2539],
    [1613, 4088, 3991, 6461, 2691],
    [1560, 3392, 3826, 4787, 2613],
    [1608, 4802, 3932, 4477, 2705],
    [1576, 3933, 3909, 4979, 2685],
    [  95,  229,  255,  496,  201],
    [   2,    0,    1,   27,    0],
    [1438, 3785, 3589, 4174, 2215],
    [1342, 4043, 4009, 4665, 3033]
])

In [9]:
def min_and_max_riders_per_day(ridership):

    mean_daily_ridership = ridership.mean(axis=0)
    max_daily_ridership = mean_daily_ridership.max()
    min_daily_ridership = mean_daily_ridership.min()
    
    return (max_daily_ridership, min_daily_ridership)

In [10]:
min_and_max_riders_per_day(ridership)

(3239.9000000000001, 1071.2)

#Numpy and Pandas Data Type

In [11]:
np.array([1,2,3,4,5]).dtype

dtype('int32')

In [12]:
enrollments = np.array([
        ['account_key','status','join_date','days_to_cancel','is_udacity'],
        [448,'canceled','2014-11-10',65,True],
        [448,'canceled','2014-11-05',5,True],
        [448,'canceled','2015-01-27',0,True],
        [448,'canceled','2014-11-10',0,True],
        [448,'current','2015-03-10',np.nan,True]
    ])

In [13]:
enrollments

array([['account_key', 'status', 'join_date', 'days_to_cancel',
        'is_udacity'],
       ['448', 'canceled', '2014-11-10', '65', 'True'],
       ['448', 'canceled', '2014-11-05', '5', 'True'],
       ['448', 'canceled', '2015-01-27', '0', 'True'],
       ['448', 'canceled', '2014-11-10', '0', 'True'],
       ['448', 'current', '2015-03-10', 'nan', 'True']], 
      dtype='<U14')

This has converted everything to string. This could create problem while calculating mean and other metrices. Thus, Pandas Dataframe is used.

In [14]:
import pandas as pd

enrollments_df = pd.DataFrame({
        'account_key': [448,448,448,448,448],
        'status': ['canceled','canceled','canceled','canceled','current'],
        'join_date': ['2014-11-10','2014-11-05','2015-01-27','2014-11-10','2015-03-10'],
        'days_to_cancel': [65,5,0,0,np.nan],
        'is_udacity': [True,True,True,True,True]
    })

In [15]:
enrollments_df

Unnamed: 0,account_key,days_to_cancel,is_udacity,join_date,status
0,448,65.0,True,2014-11-10,canceled
1,448,5.0,True,2014-11-05,canceled
2,448,0.0,True,2015-01-27,canceled
3,448,0.0,True,2014-11-10,canceled
4,448,,True,2015-03-10,current


In [16]:
enrollments_df.mean()

account_key       448.0
days_to_cancel     17.5
is_udacity          1.0
dtype: float64

#Accessing Elements of DataFrame

In [17]:
ridership_df = pd.DataFrame({
        'R003': [0,1478,1613,1560,1608,1576,95,2,1438,1342],
        'R004': [0,3877,4088,3392,4802,3933,229,0,3785,4043],
        'R005': [2,3674,3991,3826,3932,3909,255,1,3589,4009],
        'R006': [5,2328,6461,4787,4477,4979,496,27,4174,4665],
        'R007': [0,2539,2691,2613,2705,2685,201,0,2215,3033]
    },index=[
        '05-01-11','05-02-11','05-03-11','05-04-11','05-05-11',
        '05-06-11','05-07-11','05-08-11','05-09-11','05-10-11'
    ])

In [18]:
ridership_df

Unnamed: 0,R003,R004,R005,R006,R007
05-01-11,0,0,2,5,0
05-02-11,1478,3877,3674,2328,2539
05-03-11,1613,4088,3991,6461,2691
05-04-11,1560,3392,3826,4787,2613
05-05-11,1608,4802,3932,4477,2705
05-06-11,1576,3933,3909,4979,2685
05-07-11,95,229,255,496,201
05-08-11,2,0,1,27,0
05-09-11,1438,3785,3589,4174,2215
05-10-11,1342,4043,4009,4665,3033


In [19]:
ridership_df.loc['05-02-11']

R003    1478
R004    3877
R005    3674
R006    2328
R007    2539
Name: 05-02-11, dtype: int64

In [20]:
ridership_df.iloc[9]

R003    1342
R004    4043
R005    4009
R006    4665
R007    3033
Name: 05-10-11, dtype: int64

In [21]:
ridership_df.iloc[1,3]

2328

In [22]:
ridership_df.loc['05-02-11','R003']

1478

In [23]:
ridership_df['R005']

05-01-11       2
05-02-11    3674
05-03-11    3991
05-04-11    3826
05-05-11    3932
05-06-11    3909
05-07-11     255
05-08-11       1
05-09-11    3589
05-10-11    4009
Name: R005, dtype: int64

In [24]:
ridership_df.values

array([[   0,    0,    2,    5,    0],
       [1478, 3877, 3674, 2328, 2539],
       [1613, 4088, 3991, 6461, 2691],
       [1560, 3392, 3826, 4787, 2613],
       [1608, 4802, 3932, 4477, 2705],
       [1576, 3933, 3909, 4979, 2685],
       [  95,  229,  255,  496,  201],
       [   2,    0,    1,   27,    0],
       [1438, 3785, 3589, 4174, 2215],
       [1342, 4043, 4009, 4665, 3033]], dtype=int64)

In [25]:
ridership_df.values.mean()

2342.5999999999999

In [26]:
ridership_df = pd.DataFrame(
    data=[[   0,    0,    2,    5,    0],
          [1478, 3877, 3674, 2328, 2539],
          [1613, 4088, 3991, 6461, 2691],
          [1560, 3392, 3826, 4787, 2613],
          [1608, 4802, 3932, 4477, 2705],
          [1576, 3933, 3909, 4979, 2685],
          [  95,  229,  255,  496,  201],
          [   2,    0,    1,   27,    0],
          [1438, 3785, 3589, 4174, 2215],
          [1342, 4043, 4009, 4665, 3033]],
    index=['05-01-11', '05-02-11', '05-03-11', '05-04-11', '05-05-11',
           '05-06-11', '05-07-11', '05-08-11', '05-09-11', '05-10-11'],
    columns=['R003', 'R004', 'R005', 'R006', 'R007']
)

In [27]:
df_1 = pd.DataFrame({'A': [0, 1, 2], 'B': [3, 4, 5]})
print(df_1)

df_2 = pd.DataFrame([[0, 1, 2], [3, 4, 5]], columns=['A', 'B', 'C'])
print(df_2)
   
print(ridership_df.iloc[0])
print(ridership_df.loc['05-05-11'])
print(ridership_df['R003'])
print(ridership_df.iloc[1, 3])

   A  B
0  0  3
1  1  4
2  2  5
   A  B  C
0  0  1  2
1  3  4  5
R003    0
R004    0
R005    2
R006    5
R007    0
Name: 05-01-11, dtype: int64
R003    1608
R004    4802
R005    3932
R006    4477
R007    2705
Name: 05-05-11, dtype: int64
05-01-11       0
05-02-11    1478
05-03-11    1613
05-04-11    1560
05-05-11    1608
05-06-11    1576
05-07-11      95
05-08-11       2
05-09-11    1438
05-10-11    1342
Name: R003, dtype: int64
2328


In [28]:
print(ridership_df.iloc[1:4])

print(ridership_df[['R003', 'R005']])

df = pd.DataFrame({'A': [0, 1, 2], 'B': [3, 4, 5]})
print(df.sum())
print(df.sum(axis=1))
print(df.values.sum())

          R003  R004  R005  R006  R007
05-02-11  1478  3877  3674  2328  2539
05-03-11  1613  4088  3991  6461  2691
05-04-11  1560  3392  3826  4787  2613
          R003  R005
05-01-11     0     2
05-02-11  1478  3674
05-03-11  1613  3991
05-04-11  1560  3826
05-05-11  1608  3932
05-06-11  1576  3909
05-07-11    95   255
05-08-11     2     1
05-09-11  1438  3589
05-10-11  1342  4009
A     3
B    12
dtype: int64
0    3
1    5
2    7
dtype: int64
15


In [29]:
def mean_riders_for_max_station(ridership):
    max_station = ridership.iloc[0].argmax()
    overall_mean = ridership.values.mean()
    mean_for_max = ridership[max_station].mean()
    
    return (overall_mean, mean_for_max)

In [30]:
mean_riders_for_max_station(ridership_df)

(2342.5999999999999, 3239.9)

#Loading Data into a DataFrame
DataFrame are a great data structure to represent CSVs:

In [31]:
subway_df = pd.read_csv('nyc_subway_weather.csv')

In [32]:
subway_df.head() #print first 5 rows

Unnamed: 0,UNIT,DATEn,TIMEn,ENTRIESn,EXITSn,ENTRIESn_hourly,EXITSn_hourly,datetime,hour,day_week,...,pressurei,rain,tempi,wspdi,meanprecipi,meanpressurei,meantempi,meanwspdi,weather_lat,weather_lon
0,R003,05-01-11,00:00:00,4388333,2911002,0,0,2011-05-01 00:00:00,0,6,...,30.22,0,55.9,3.5,0,30.258,55.98,7.86,40.700348,-73.887177
1,R003,05-01-11,04:00:00,4388333,2911002,0,0,2011-05-01 04:00:00,4,6,...,30.25,0,52.0,3.5,0,30.258,55.98,7.86,40.700348,-73.887177
2,R003,05-01-11,12:00:00,4388333,2911002,0,0,2011-05-01 12:00:00,12,6,...,30.28,0,62.1,6.9,0,30.258,55.98,7.86,40.700348,-73.887177
3,R003,05-01-11,16:00:00,4388333,2911002,0,0,2011-05-01 16:00:00,16,6,...,30.26,0,57.9,15.0,0,30.258,55.98,7.86,40.700348,-73.887177
4,R003,05-01-11,20:00:00,4388333,2911002,0,0,2011-05-01 20:00:00,20,6,...,30.28,0,52.0,10.4,0,30.258,55.98,7.86,40.700348,-73.887177


In [33]:
subway_df.describe()

Unnamed: 0,ENTRIESn,EXITSn,ENTRIESn_hourly,EXITSn_hourly,hour,day_week,weekday,latitude,longitude,fog,...,pressurei,rain,tempi,wspdi,meanprecipi,meanpressurei,meantempi,meanwspdi,weather_lat,weather_lon
count,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0,...,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0,42649.0
mean,28124860.0,19869930.0,1886.589955,1361.487866,10.046754,2.905719,0.714436,40.724647,-73.940364,0.009824,...,29.971096,0.224741,63.10378,6.927872,0.004618,29.971096,63.10378,6.927872,40.728555,-73.938693
std,30436070.0,20289860.0,2952.385585,2183.845409,6.938928,2.079231,0.451688,0.07165,0.059713,0.098631,...,0.137942,0.417417,8.455597,4.510178,0.016344,0.131158,6.939011,3.179832,0.06542,0.059582
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,40.576152,-74.073622,0.0,...,29.55,0.0,46.9,0.0,0.0,29.59,49.4,0.0,40.600204,-74.01487
25%,10397620.0,7613712.0,274.0,237.0,4.0,1.0,0.0,40.677107,-73.987342,0.0,...,29.89,0.0,57.0,4.6,0.0,29.913333,58.283333,4.816667,40.688591,-73.98513
50%,18183890.0,13316090.0,905.0,664.0,12.0,3.0,1.0,40.717241,-73.953459,0.0,...,29.96,0.0,61.0,6.9,0.0,29.958,60.95,6.166667,40.72057,-73.94915
75%,32630490.0,23937710.0,2255.0,1537.0,16.0,5.0,1.0,40.759123,-73.907733,0.0,...,30.06,0.0,69.1,9.2,0.0,30.06,67.466667,8.85,40.755226,-73.912033
max,235774600.0,149378200.0,32814.0,34828.0,20.0,6.0,1.0,40.889185,-73.755383,1.0,...,30.32,1.0,86.0,23.0,0.1575,30.293333,79.8,17.083333,40.862064,-73.694176


#Calculating Correlation (Pearson's r)
x1 -> y1 (both above mean? both below mean? one above and one below mean?)

Pearson's r:
    - First Standardize each variable
    - Multiply each pair of values, and take the average
r = average of (x in std units) * (y in std units)

In [34]:
def correlation(x, y):
    std_x = (x - x.mean()) / x.std(ddof = 0)
    std_y = (y - y.mean()) / y.std(ddof = 0)
    return (std_x * std_y).mean()

In [35]:
entries = subway_df['ENTRIESn_hourly']
cum_entries = subway_df['ENTRIESn']
rain = subway_df['meanprecipi']
temp = subway_df['meantempi']

print(correlation(entries, rain))
print(correlation(entries, temp))
print(correlation(rain, temp))
print(correlation(entries, cum_entries))

0.03564851577223041
-0.026693348321569912
-0.22903432340833663
0.5858954707662182


#Pandas Axis Names
Instead of axis = 0 or axis = 1, use axis = 'index' or axis = 'column'
#DataFrame Vectorized Operations
Similar to vectorized operations for 2D Numpy arrays

In [36]:
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]})
df2 = pd.DataFrame({'a': [10, 20, 30], 'b': [40, 50, 60], 'c': [70, 80, 90]})
print(df1 + df2)

    a   b   c
0  11  44  77
1  22  55  88
2  33  66  99


In [37]:
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]})
df2 = pd.DataFrame({'d': [10, 20, 30], 'c': [40, 50, 60], 'b': [70, 80, 90]})
print(df1 + df2)

    a   b   c   d
0 NaN  74  47 NaN
1 NaN  85  58 NaN
2 NaN  96  69 NaN


In [38]:
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]},
                   index=['row1', 'row2', 'row3'])
df2 = pd.DataFrame({'a': [10, 20, 30], 'b': [40, 50, 60], 'c': [70, 80, 90]},
                   index=['row4', 'row3', 'row2'])
print(df1 + df2)

       a   b   c
row1 NaN NaN NaN
row2  32  65  98
row3  23  56  89
row4 NaN NaN NaN


Converting a dataframe from total to hourly entries and exits

In [39]:
entries_and_exits = pd.DataFrame({
    'ENTRIESn': [3144312, 3144335, 3144353, 3144424, 3144594,
                 3144808, 3144895, 3144905, 3144941, 3145094],
    'EXITSn': [1088151, 1088159, 1088177, 1088231, 1088275,
               1088317, 1088328, 1088331, 1088420, 1088753]
})

In [40]:
def get_hourly_entries_and_exits(entries_and_exits):
    return entries_and_exits - entries_and_exits.shift(1)

In [41]:
get_hourly_entries_and_exits(entries_and_exits)

Unnamed: 0,ENTRIESn,EXITSn
0,,
1,23.0,8.0
2,18.0,18.0
3,71.0,54.0
4,170.0,44.0
5,214.0,42.0
6,87.0,11.0
7,10.0,3.0
8,36.0,89.0
9,153.0,333.0


#Non-built-in Function for DataFrame
DataFrame applymap()

In [42]:
df = pd.DataFrame({
        'a': [1, 2, 3],
        'b': [10, 20, 30],
        'c': [5, 10, 15]
    })
def add_one(x):
    return x + 1

print(df.applymap(add_one))

   a   b   c
0  2  11   6
1  3  21  11
2  4  31  16


Converting numerical grades to letter grades:
    - 90-100: A
    - 80-89: B
    - 70-79: C
    - 60-69: D
    - 0-59: F

In [43]:
grades_df = pd.DataFrame(
    data={'exam1': [43, 81, 78, 75, 89, 70, 91, 65, 98, 87],
          'exam2': [24, 63, 56, 56, 67, 51, 79, 46, 72, 60]},
    index=['Andre', 'Barry', 'Chris', 'Dan', 'Emilio', 
           'Fred', 'Greta', 'Humbert', 'Ivan', 'James']
)
grades_df

Unnamed: 0,exam1,exam2
Andre,43,24
Barry,81,63
Chris,78,56
Dan,75,56
Emilio,89,67
Fred,70,51
Greta,91,79
Humbert,65,46
Ivan,98,72
James,87,60


In [44]:
def convert_grades(grades):
    grade_letter = None
    if grades >= 90 and grades <= 100:
        grade_letter = 'A'
    elif grades >= 80 and grades <= 89:
        grade_letter = 'A'
    elif grades >= 70 and grades <= 79:
        grade_letter = 'A'
    elif grades >= 60 and grades <= 69:
        grade_letter = 'A'
    else:
        grade_letter = 'F'
    return grade_letter

In [45]:
print(grades_df.applymap(convert_grades))

        exam1 exam2
Andre       F     F
Barry       A     A
Chris       A     F
Dan         A     F
Emilio      A     A
Fred        A     F
Greta       A     A
Humbert     A     F
Ivan        A     A
James       A     A


##DataFrame apply()
Works on a column or a row of dataframe

**Importance:** When you want to apply function on only one column or row

In [46]:
grades_df = pd.DataFrame(
    data={'exam1': [43, 81, 78, 75, 89, 70, 91, 65, 98, 87],
          'exam2': [24, 63, 56, 56, 67, 51, 79, 46, 72, 60]},
    index=['Andre', 'Barry', 'Chris', 'Dan', 'Emilio', 
           'Fred', 'Greta', 'Humbert', 'Ivan', 'James']
)

In [47]:
def convert_grades_curve(exam_grades):
    return pd.qcut(exam_grades,[0, 0.1, 0.2, 0.5, 0.8, 1],labels=['F', 'D', 'C', 'B', 'A'])

print(convert_grades_curve(grades_df['exam1']))
print(grades_df.apply(convert_grades_curve))

Andre      F
Barry      B
Chris      C
Dan        C
Emilio     B
Fred       C
Greta      A
Humbert    D
Ivan       A
James      B
Name: exam1, dtype: category
Categories (5, object): [F < D < C < B < A]
        exam1 exam2
Andre       F     F
Barry       B     B
Chris       C     C
Dan         C     C
Emilio      B     B
Fred        C     C
Greta       A     A
Humbert     D     D
Ivan        A     A
James       B     B


In [48]:
def standardize(df):
    std_df = (df - df.mean()) / df.std(ddof = 0)
    return std_df

In [49]:
print(standardize(grades_df['exam1']))
print(grades_df.apply(standardize))

Andre     -2.315341
Barry      0.220191
Chris      0.020017
Dan       -0.180156
Emilio     0.753987
Fred      -0.513779
Greta      0.887436
Humbert   -0.847401
Ivan       1.354508
James      0.620538
Name: exam1, dtype: float64
            exam1     exam2
Andre   -2.315341 -2.304599
Barry    0.220191  0.386400
Chris    0.020017 -0.096600
Dan     -0.180156 -0.096600
Emilio   0.753987  0.662400
Fred    -0.513779 -0.441600
Greta    0.887436  1.490400
Humbert -0.847401 -0.786600
Ivan     1.354508  1.007400
James    0.620538  0.179400


**Use Case 2:** When function would resturn a single value for single column or row. In this case, a series will be the output for a dataframe

In [50]:
df = pd.DataFrame({
    'a': [4, 5, 3, 1, 2],
    'b': [20, 10, 40, 50, 30],
    'c': [25, 20, 5, 15, 10]
})

In [51]:
print(df.apply(np.mean))
print(df.apply(np.max))

a     3
b    30
c    15
dtype: float64
a     5
b    50
c    25
dtype: int64


In [52]:
def second_largest(column):
    column_sort = column.sort(ascending = False,inplace = False)
    return column_sort.iloc[1]

In [53]:
df.apply(second_largest)

a     4
b    40
c    20
dtype: int64

#Adding a DataFrame to a Series

In [54]:
s = pd.Series([1, 2, 3, 4])
df = pd.DataFrame({
    0: [10, 20, 30, 40],
    1: [50, 60, 70, 80],
    2: [90, 100, 110, 120],
    3: [130, 140, 150, 160]
})

print(df)
print('') # Create a blank line between outputs
print(df + s)

    0   1    2    3
0  10  50   90  130
1  20  60  100  140
2  30  70  110  150
3  40  80  120  160

    0   1    2    3
0  11  52   93  134
1  21  62  103  144
2  31  72  113  154
3  41  82  123  164


In [55]:
s = pd.Series([1, 2, 3, 4])
df = pd.DataFrame({0: [10], 1: [20], 2: [30], 3: [40]})

print(df)
print('') # Create a blank line between outputs
print(df + s)

    0   1   2   3
0  10  20  30  40

    0   1   2   3
0  11  22  33  44


In [56]:
s = pd.Series([1, 2, 3, 4])
df = pd.DataFrame({0: [10, 20, 30, 40]})

print(df)
print('') # Create a blank line between outputs
print(df + s)
print('')
print(df.add(s,axis='columns'))

    0
0  10
1  20
2  30
3  40

    0   1   2   3
0  11 NaN NaN NaN
1  21 NaN NaN NaN
2  31 NaN NaN NaN
3  41 NaN NaN NaN

    0   1   2   3
0  11 NaN NaN NaN
1  21 NaN NaN NaN
2  31 NaN NaN NaN
3  41 NaN NaN NaN


In [57]:
s = pd.Series([1, 2, 3, 4])
df = pd.DataFrame({0: [10, 20, 30, 40]})

print(df)
print('') # Create a blank line between outputs
print(df.add(s,axis='index'))

    0
0  10
1  20
2  30
3  40

    0
0  11
1  22
2  33
3  44


In [58]:
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
df = pd.DataFrame({
    'a': [10, 20, 30, 40],
    'b': [50, 60, 70, 80],
    'c': [90, 100, 110, 120],
    'd': [130, 140, 150, 160]
})

print(df)
print('') # Create a blank line between outputs
print(df + s)

    a   b    c    d
0  10  50   90  130
1  20  60  100  140
2  30  70  110  150
3  40  80  120  160

    a   b    c    d
0  11  52   93  134
1  21  62  103  144
2  31  72  113  154
3  41  82  123  164


In [59]:
s = pd.Series([1, 2, 3, 4])
df = pd.DataFrame({
    'a': [10, 20, 30, 40],
    'b': [50, 60, 70, 80],
    'c': [90, 100, 110, 120],
    'd': [130, 140, 150, 160]
})

print(df)
print('') # Create a blank line between outputs
print(df + s)

    a   b    c    d
0  10  50   90  130
1  20  60  100  140
2  30  70  110  150
3  40  80  120  160

    a   b   c   d   0   1   2   3
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN


In [60]:
s = pd.Series([1, 2, 3, 4], index=['b', 'c', 'd', 'e'])
df = pd.DataFrame({
    'a': [10, 20, 30, 40],
    'b': [50, 60, 70, 80],
    'c': [90, 100, 110, 120],
    'd': [130, 140, 150, 160]
})

print(df)
print('') # Create a blank line between outputs
print(df + s)

    a   b    c    d
0  10  50   90  130
1  20  60  100  140
2  30  70  110  150
3  40  80  120  160

    a   b    c    d   e
0 NaN  51   92  133 NaN
1 NaN  61  102  143 NaN
2 NaN  71  112  153 NaN
3 NaN  81  122  163 NaN
