In [2]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

Using pandas Series allows you to create indexed lists or arrays shown below.

In [None]:
obj = Series([3,6,9,12])

In [4]:
obj

0     3
1     6
2     9
3    12
dtype: int64

A Series kind of looks like a column of a spreadsheet.  Using values method like we did for dictionaries allows all of the same functionality in a Series.

In [5]:
obj.values

array([ 3,  6,  9, 12], dtype=int64)

In [6]:
obj.index

RangeIndex(start=0, stop=4, step=1)

We can also create our own indexes for the lists / arrays along with them as shown below

In [7]:
ww2_cas = Series([8700000, 4300000,3000000,2100000, 400000],index= ['USSR','Germany','China','Japan','USA'])

In [8]:
ww2_cas

USSR       8700000
Germany    4300000
China      3000000
Japan      2100000
USA         400000
dtype: int64

We can then search by index and by value

In [9]:
ww2_cas['USA']

400000

In [10]:
#Check for conutries with casualties > 4 mill
ww2_cas[ww2_cas > 4000000]

USSR       8700000
Germany    4300000
dtype: int64

In [11]:
'USSR' in ww2_cas

True

We can easily change a Series to a dictionary as shown below

In [12]:
ww2_dict = ww2_cas.to_dict()

In [13]:
ww2_dict

{'USSR': 8700000,
 'Germany': 4300000,
 'China': 3000000,
 'Japan': 2100000,
 'USA': 400000}

In [14]:
ww2_series = Series(ww2_dict)

In [15]:
ww2_series

USSR       8700000
Germany    4300000
China      3000000
Japan      2100000
USA         400000
dtype: int64

And we can create new Series from dictionaries indexed with variables that contain whatever we want as shown below

In [16]:
countries = ['China', 'Germany', 'Japan']

In [17]:
obj2 = Series(ww2_dict, index=countries)

In [19]:
obj2

China      3000000
Germany    4300000
Japan      2100000
dtype: int64

We can also check for null values using pandas methods like isnull or notnull.

In [20]:
pd.isnull(obj2)

China      False
Germany    False
Japan      False
dtype: bool

In [21]:
pd.notnull(obj2)

China      True
Germany    True
Japan      True
dtype: bool

In [22]:
ww2_series + obj2

China      6000000.0
Germany    8600000.0
Japan      4200000.0
USA              NaN
USSR             NaN
dtype: float64

In [23]:
obj2.name = 'World War 2 Casualties'

In [24]:
obj2

China      3000000
Germany    4300000
Japan      2100000
Name: World War 2 Casualties, dtype: int64

In [25]:
obj2.index.name = 'Countries'

In [26]:
obj2

Countries
China      3000000
Germany    4300000
Japan      2100000
Name: World War 2 Casualties, dtype: int64

Data frames are like a spreadsheet in pandas.  It seems much easier to use than numpy's arrays and matrices, but there aren't any mathematical functions available for it.

In [2]:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame

In [4]:
import webbrowser
website = 'https://en.wikipedia.org/wiki/NFL_win-loss_records'
webbrowser.open(website)

True

With pandas data frames you can literally use python's built in webbrowser function to pull up a webpage to pull data to create your data frame.  You just copy any table to your clipboard and use the pd.read_clipboard() statement to capture.  Below I captured the nfl wins losses just like taking a snippet.  Very cool!.  

In [5]:
nfl_frame = pd.read_clipboard()

In [6]:
nfl_frame

Unnamed: 0,Rank,Team,GP,Won,Lost,Tied,Pct.,First NFL Season,Division
0,1,Dallas Cowboys,914,520,388,6,0.572,1960,NFC East
1,2,Green Bay Packers,1368,756,574,38,0.567,1921,NFC North
2,3,New England Patriots,916,512,395,9,0.564,1960,AFC East
3,4,Chicago Bears,1402,769,591,42,0.563,1920,NFC North
4,5,Baltimore Ravens,384,214,169,1,0.559,1996,AFC North


Then you can pull out different values by using the many .methods available including any of the headings as long as they are one word.

In [7]:
nfl_frame.columns

Index(['Rank', 'Team', 'GP', 'Won', 'Lost', 'Tied', 'Pct.', 'First NFL Season',
       'Division'],
      dtype='object')

In [9]:
nfl_frame.Team

0          Dallas Cowboys
1       Green Bay Packers
2    New England Patriots
3           Chicago Bears
4        Baltimore Ravens
Name: Team, dtype: object

If there are two words in the heading then use the index method  with the exact name in quotes.  Verify by calling with the columns method first.

In [10]:
nfl_frame['First NFL Season']

0    1960
1    1921
2    1960
3    1920
4    1996
Name: First NFL Season, dtype: int64

To call multiple columns use the DataFrame() function and pass the name of the data frame you want to use along with a list of columns you want by name in quotes as shown below.

In [12]:
DataFrame(nfl_frame, columns=['Team','First NFL Season'])

Unnamed: 0,Team,First NFL Season
0,Dallas Cowboys,1960
1,Green Bay Packers,1921
2,New England Patriots,1960
3,Chicago Bears,1920
4,Baltimore Ravens,1996


Here is how we pull rows.

In [14]:
nfl_frame.head()# to see what you have and you can pass a number to pull more or fewer rows

Unnamed: 0,Rank,Team,GP,Won,Lost,Tied,Pct.,First NFL Season,Division
0,1,Dallas Cowboys,914,520,388,6,0.572,1960,NFC East
1,2,Green Bay Packers,1368,756,574,38,0.567,1921,NFC North
2,3,New England Patriots,916,512,395,9,0.564,1960,AFC East
3,4,Chicago Bears,1402,769,591,42,0.563,1920,NFC North
4,5,Baltimore Ravens,384,214,169,1,0.559,1996,AFC North


In my class, the instructor uses the .ix method, but this method was deprecated in pandas 0.20.1 in 2016 I think.  I had to use loc or iloc instead.  I found loc to be very similar at least superficially and the exact structure of the instructors code worked with loc instead of ix.

In [18]:
nfl_frame.loc[3]

Rank                            4
Team                Chicago Bears
GP                          1,402
Won                           769
Lost                          591
Tied                           42
Pct.                        0.563
First NFL Season             1920
Division                NFC North
Name: 3, dtype: object

In [19]:
nfl_frame['Stadium'] = "Levi's Stadium"

In [20]:
nfl_frame

Unnamed: 0,Rank,Team,GP,Won,Lost,Tied,Pct.,First NFL Season,Division,Stadium
0,1,Dallas Cowboys,914,520,388,6,0.572,1960,NFC East,Levi's Stadium
1,2,Green Bay Packers,1368,756,574,38,0.567,1921,NFC North,Levi's Stadium
2,3,New England Patriots,916,512,395,9,0.564,1960,AFC East,Levi's Stadium
3,4,Chicago Bears,1402,769,591,42,0.563,1920,NFC North,Levi's Stadium
4,5,Baltimore Ravens,384,214,169,1,0.559,1996,AFC North,Levi's Stadium


In [21]:
nfl_frame['Stadium'] = np.arange(5)

In [22]:
nfl_frame

Unnamed: 0,Rank,Team,GP,Won,Lost,Tied,Pct.,First NFL Season,Division,Stadium
0,1,Dallas Cowboys,914,520,388,6,0.572,1960,NFC East,0
1,2,Green Bay Packers,1368,756,574,38,0.567,1921,NFC North,1
2,3,New England Patriots,916,512,395,9,0.564,1960,AFC East,2
3,4,Chicago Bears,1402,769,591,42,0.563,1920,NFC North,3
4,5,Baltimore Ravens,384,214,169,1,0.559,1996,AFC North,4


We can even add a Series as shown below

In [23]:
stadiums =  Series(["Levi's Stadium", ' AT&T Stadium'], index = [4,0])

In [24]:
stadiums

4    Levi's Stadium
0      AT&T Stadium
dtype: object

In [25]:
nfl_frame ['Stadium'] = stadiums

In [26]:
nfl_frame

Unnamed: 0,Rank,Team,GP,Won,Lost,Tied,Pct.,First NFL Season,Division,Stadium
0,1,Dallas Cowboys,914,520,388,6,0.572,1960,NFC East,AT&T Stadium
1,2,Green Bay Packers,1368,756,574,38,0.567,1921,NFC North,
2,3,New England Patriots,916,512,395,9,0.564,1960,AFC East,
3,4,Chicago Bears,1402,769,591,42,0.563,1920,NFC North,
4,5,Baltimore Ravens,384,214,169,1,0.559,1996,AFC North,Levi's Stadium


We can delete by using del as shown below

In [27]:
del nfl_frame['Stadium']

In [28]:
nfl_frame

Unnamed: 0,Rank,Team,GP,Won,Lost,Tied,Pct.,First NFL Season,Division
0,1,Dallas Cowboys,914,520,388,6,0.572,1960,NFC East
1,2,Green Bay Packers,1368,756,574,38,0.567,1921,NFC North
2,3,New England Patriots,916,512,395,9,0.564,1960,AFC East
3,4,Chicago Bears,1402,769,591,42,0.563,1920,NFC North
4,5,Baltimore Ravens,384,214,169,1,0.559,1996,AFC North


You can also create data frames by passing in a dictionary

In [29]:
data = {'City': ['SF', 'LA', 'NYC'], 'Population': [837000, 3880000, 8400000]}

In [30]:
city_frame = DataFrame(data)

In [31]:
city_frame

Unnamed: 0,City,Population
0,SF,837000
1,LA,3880000
2,NYC,8400000
