# Getting started with Python

## This is a python notebook in Sherlock - This is where the magic happens

To run a command, click in the cell and click the play button above or press Ctrl+enter (or shift+enter which automatically places your cursor in the next cell down, or alt+enter to also add a new cell below)

In [1]:
print ("Hello, Everyone!")

Hello, Everyone!


In [2]:
#shorthand for print 1+2 , can only be used once per cell to avoid ambiguity
1+2 

3

In [3]:
for i in range(0,5):
    print (i)

0
1
2
3
4


#### Notebooks have two different keyboard input modes:
1. <b>Edit mode</b> allows you to type code/text into a cell and is indicated by a green cell border. 
2. <b>Command mode</b> binds the keyboard to notebook level actions and is indicated by a grey cell border.
<br>
> Change from edit to command mode by press `esc`. And change back by hitting `enter` 

#### Change, add and delete cells in command mode
- Change cell type from code to markdown by pressing `m`. Change it back to code with `y`. 
- Add a cell above with `a` and below with `b`
- Delete a cell with `dd`

## Libraries
The core libraries for data analysis are:
- Pandas
- Numpy
- Matplotlib

### Pandas
Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.

The core data structure pandas is the dataframe. It is well suited for many different kinds of data:
- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
- Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

## So here we meet pandas
## Pandas is your best friend

In [6]:
# Now we import pandas
import pandas as pd

#from now on, every time you type pd.method, python will look to see if that method exists within the pandas library 
#and if so, it will be called

So brilliant now what!

Now we want to load in the data to a dataframe.

A dataframe is close to a spreadsheet held in memory

In [3]:
#first we need to find our file

In [4]:
ls 
#Remember our data is stored in a folder on our workspace called data

NameError: name 'ls' is not defined

A word of advice, use words in your programing. 

'nfl_data' you can understand in two months and understand

'a_new' not so much 

In [7]:
nfl_data = pd.read_csv('2015_stats.csv')
#nfl_data = pd.read_csv('/home/sherlock/workspace/data/2015_stats.csv')

# How to look at your data

the dataframe in my program is called nfl_data. 

However people on the internet often use the term df as a generic term.

df.head()

df.tail()

df.sample()

### .sample() randomly from the dataframe. This can be slow for large files.

In [8]:
nfl_data.sample(2)

Unnamed: 0,Date,Vis Team,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,...,Penalty Yards.1,Fumbles Lost.1,Interceptions Thrown.1,1st Downs.1,3rd Down Attempts.1,3rd Down Conversions.1,4th Down Attempts.1,4th Down conversions.1,Time of Possession.1,Score.1
34,2015-09-27,Buccaneers,57,20,261,36,17,10,84,0,...,58,0,1,30,18,8,0,0,2177,19
21,2015-09-20,Patriots,56,15,451,59,38,11,119,2,...,140,0,3,23,13,4,1,1,1803,32


### .head() top of the dataframe
### .tail() bottom of the dataframe

In [9]:
nfl_data.head(2)

Unnamed: 0,Date,Vis Team,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,...,Penalty Yards.1,Fumbles Lost.1,Interceptions Thrown.1,1st Downs.1,3rd Down Attempts.1,3rd Down Conversions.1,4th Down Attempts.1,4th Down conversions.1,Time of Possession.1,Score.1
0,Date,Vis Team,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,...,Penalty Yards,Fumbles Lost,Interceptions Thrown,1st Downs,3rd Down Attempts,3rd Down Conversions,4th Down Attempts,4th Down conversions,Time of Possession,Score
1,2015-09-11,Steelers,134,25,330,38,26,8,77,0,...,64,0,0,26,11,7,0,0,1675,28


In [10]:
nfl_data.tail(2)

Unnamed: 0,Date,Vis Team,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,...,Penalty Yards.1,Fumbles Lost.1,Interceptions Thrown.1,1st Downs.1,3rd Down Attempts.1,3rd Down Conversions.1,4th Down Attempts.1,4th Down conversions.1,Time of Possession.1,Score.1
255,2016-01-03,Ravens,59,21,282,56,30,4,38,0,...,72,0,0,17,9,0,1,1,1847,24
256,2016-01-04,Vikings,151,27,91,19,10,5,37,1,...,54,1,1,19,15,2,6,3,2142,13


#### looks like the top line is wrong

how can we fix it....

Well we need to learn a bit of basic python!

This book is free and very good!
https://learnpythonthehardway.org/


However I will take you on a whistle stop tour here of one simple thing.

###### A python list
A list is a simple data structure



In [11]:
list_letters = ['a','b','c','d','e','f']

To access a letter we simple put its place in the list.
Starting from 0!

In [12]:
list_letters[0]

'a'

In [13]:
list_letters[5]

'f'

To access more than one we can add a semicolon and the index we want to start from

In [14]:
list_letters[2:]

['c', 'd', 'e', 'f']

In [15]:
list_letters[1:4]

['b', 'c', 'd']

Note that 1:4 is the 2nd, 3rd, 4th

If you put a minus at the end you can drop the last 2

In [16]:
list_letters[:-2]

['a', 'b', 'c', 'd']

An extra semicolon allows you to go in jumps

In [17]:
list_letters[::2]

['a', 'c', 'e']

But now we are simply showing off


What is important here is that our pandas dataframe behaves like a list!

So we want to drop the first row!

In [18]:
nfl_data[1:]

Unnamed: 0,Date,Vis Team,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,...,Penalty Yards.1,Fumbles Lost.1,Interceptions Thrown.1,1st Downs.1,3rd Down Attempts.1,3rd Down Conversions.1,4th Down Attempts.1,4th Down conversions.1,Time of Possession.1,Score.1
1,2015-09-11,Steelers,134,25,330,38,26,8,77,0,...,64,0,0,26,11,7,0,0,1675,28
2,2015-09-13,Bengals,127,31,269,34,25,6,50,0,...,32,1,1,16,12,3,3,2,1648,13
3,2015-09-13,Seahawks,124,32,219,41,32,7,46,0,...,30,3,0,19,11,6,0,0,1712,34
4,2015-09-13,Saints,54,20,354,48,30,7,73,0,...,30,1,0,25,10,5,0,0,1596,31
5,2015-09-13,Lions,69,16,233,30,19,4,29,0,...,40,1,2,28,11,6,0,0,2292,33
6,2015-09-13,Packers,133,30,189,23,18,10,74,0,...,64,0,1,25,17,11,3,2,1912,23
7,2015-09-13,Chiefs,97,32,233,33,22,2,25,0,...,39,1,1,24,14,3,1,0,1481,20
8,2015-09-13,Browns,104,28,217,32,18,12,109,4,...,30,0,1,18,13,7,1,0,1700,31
9,2015-09-13,Titans,124,32,185,15,13,8,55,1,...,97,0,2,16,14,3,4,2,1709,14
10,2015-09-13,Panthers,105,35,158,31,18,3,22,0,...,25,1,2,18,12,3,2,1,1554,9


In [19]:
nfl_data

Unnamed: 0,Date,Vis Team,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,...,Penalty Yards.1,Fumbles Lost.1,Interceptions Thrown.1,1st Downs.1,3rd Down Attempts.1,3rd Down Conversions.1,4th Down Attempts.1,4th Down conversions.1,Time of Possession.1,Score.1
0,Date,Vis Team,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,...,Penalty Yards,Fumbles Lost,Interceptions Thrown,1st Downs,3rd Down Attempts,3rd Down Conversions,4th Down Attempts,4th Down conversions,Time of Possession,Score
1,2015-09-11,Steelers,134,25,330,38,26,8,77,0,...,64,0,0,26,11,7,0,0,1675,28
2,2015-09-13,Bengals,127,31,269,34,25,6,50,0,...,32,1,1,16,12,3,3,2,1648,13
3,2015-09-13,Seahawks,124,32,219,41,32,7,46,0,...,30,3,0,19,11,6,0,0,1712,34
4,2015-09-13,Saints,54,20,354,48,30,7,73,0,...,30,1,0,25,10,5,0,0,1596,31
5,2015-09-13,Lions,69,16,233,30,19,4,29,0,...,40,1,2,28,11,6,0,0,2292,33
6,2015-09-13,Packers,133,30,189,23,18,10,74,0,...,64,0,1,25,17,11,3,2,1912,23
7,2015-09-13,Chiefs,97,32,233,33,22,2,25,0,...,39,1,1,24,14,3,1,0,1481,20
8,2015-09-13,Browns,104,28,217,32,18,12,109,4,...,30,0,1,18,13,7,1,0,1700,31
9,2015-09-13,Titans,124,32,185,15,13,8,55,1,...,97,0,2,16,14,3,4,2,1709,14


Crap, why is it still the same.

Well that is because we didn't assign it to anything so it didn't save.

In [224]:
nfl_data = nfl_data[1:]

In [225]:
nfl_data 

Unnamed: 0,Date,Vis Team,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,...,Penalty Yards.1,Fumbles Lost.1,Interceptions Thrown.1,1st Downs.1,3rd Down Attempts.1,3rd Down Conversions.1,4th Down Attempts.1,4th Down conversions.1,Time of Possession.1,Score.1
1,2015-09-11,Steelers,134,25,330,38,26,8,77,0,...,64,0,0,26,11,7,0,0,1675,28
2,2015-09-13,Bengals,127,31,269,34,25,6,50,0,...,32,1,1,16,12,3,3,2,1648,13
3,2015-09-13,Seahawks,124,32,219,41,32,7,46,0,...,30,3,0,19,11,6,0,0,1712,34
4,2015-09-13,Saints,54,20,354,48,30,7,73,0,...,30,1,0,25,10,5,0,0,1596,31
5,2015-09-13,Lions,69,16,233,30,19,4,29,0,...,40,1,2,28,11,6,0,0,2292,33
6,2015-09-13,Packers,133,30,189,23,18,10,74,0,...,64,0,1,25,17,11,3,2,1912,23
7,2015-09-13,Chiefs,97,32,233,33,22,2,25,0,...,39,1,1,24,14,3,1,0,1481,20
8,2015-09-13,Browns,104,28,217,32,18,12,109,4,...,30,0,1,18,13,7,1,0,1700,31
9,2015-09-13,Titans,124,32,185,15,13,8,55,1,...,97,0,2,16,14,3,4,2,1709,14
10,2015-09-13,Panthers,105,35,158,31,18,3,22,0,...,25,1,2,18,12,3,2,1,1554,9


### Whoop!

# Now lets explore

In [226]:
#This gives us information on each column
nfl_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256 entries, 1 to 256
Data columns (total 35 columns):
Date                       256 non-null object
 Vis Team                  256 non-null object
 Rushing Yards             256 non-null object
 Rushing Attempts          256 non-null object
 Passing Yards             256 non-null object
Passing Attempts           256 non-null object
 Passing Completions       256 non-null object
 Penalties                 256 non-null object
 Penalty Yards             256 non-null object
 Fumbles Lost              256 non-null object
Interceptions Thrown       256 non-null object
 1st Downs                 256 non-null object
 3rd Down Attempts         256 non-null object
 3rd Down Conversions      256 non-null object
4th Down Attempts          256 non-null object
 4th Down conversions      256 non-null object
 Time of Possession        256 non-null object
 Score                     256 non-null object
Home Team                  256 non-null object
 R

# We can go through the data more, but first lets draw a basic graph

In [227]:
import plotly.offline as off
import plotly.plotly as py
import plotly.graph_objs as go

off.init_notebook_mode(connected=True)

In [228]:
trace_1 = go.Histogram(
            x=nfl_data['Rushing Yards'],
            name = 'Rushing Yards',
)


data = [trace_1]

layout = go.Layout(
    title = 'Rushing yards' 
    
)


fig = go.Figure(data=data, layout=layout)


off.iplot(fig)

KeyError: 'Rushing Yards'

You will see this error alot!

This is saying the column you have entered is not in your dataframe

In [229]:
nfl_data.columns

Index(['Date', ' Vis Team', ' Rushing Yards', ' Rushing Attempts',
       ' Passing Yards', 'Passing Attempts', ' Passing Completions',
       ' Penalties', ' Penalty Yards', ' Fumbles Lost', 'Interceptions Thrown',
       ' 1st Downs', ' 3rd Down Attempts', ' 3rd Down Conversions',
       '4th Down Attempts', ' 4th Down conversions', ' Time of Possession',
       ' Score', 'Home Team', ' Rushing Yards.1', ' Rushing Attempts.1',
       ' Passing Yards.1', 'Passing Attempts.1', ' Passing Completions.1',
       ' Penalties.1', ' Penalty Yards.1', ' Fumbles Lost.1',
       'Interceptions Thrown.1', ' 1st Downs.1', ' 3rd Down Attempts.1',
       ' 3rd Down Conversions.1', '4th Down Attempts.1',
       ' 4th Down conversions.1', ' Time of Possession.1', ' Score.1'],
      dtype='object')

Thats annoying it should be
' Rushing Yards'

In [230]:
trace_1 = go.Histogram(
            x=nfl_data[' Rushing Yards'],
            name = 'Rushing Yards',
)


data = [trace_1]

layout = go.Layout(
    title = 'Rushing yards' 
    
)


fig = go.Figure(data=data, layout=layout)


off.iplot(fig)

#### Whoop!

But those spaces are going to annoy the hell out of us!

Lets go fix the buggers

In [231]:
# This is a for loop. 
# it is saying for each entry 
#(which i call col_name to make it easy to read later)
# print the col_name

for col_name in nfl_data.columns:
    print(col_name)

Date
 Vis Team
 Rushing Yards
 Rushing Attempts
 Passing Yards
Passing Attempts
 Passing Completions
 Penalties
 Penalty Yards
 Fumbles Lost
Interceptions Thrown
 1st Downs
 3rd Down Attempts
 3rd Down Conversions
4th Down Attempts
 4th Down conversions
 Time of Possession
 Score
Home Team
 Rushing Yards.1
 Rushing Attempts.1
 Passing Yards.1
Passing Attempts.1
 Passing Completions.1
 Penalties.1
 Penalty Yards.1
 Fumbles Lost.1
Interceptions Thrown.1
 1st Downs.1
 3rd Down Attempts.1
 3rd Down Conversions.1
4th Down Attempts.1
 4th Down conversions.1
 Time of Possession.1
 Score.1


In [232]:
for col_name in nfl_data.columns:
    
    #This is an if function
    #if the first letter of the col is ' ' (a space)
    #print that entry
    
    if col_name[0]==' ':
        print(col_name[1:])
    else:
        print(col_name)
        
#this would work (if we reassigned it) but there is a better way.

Date
Vis Team
Rushing Yards
Rushing Attempts
Passing Yards
Passing Attempts
Passing Completions
Penalties
Penalty Yards
Fumbles Lost
Interceptions Thrown
1st Downs
3rd Down Attempts
3rd Down Conversions
4th Down Attempts
4th Down conversions
Time of Possession
Score
Home Team
Rushing Yards.1
Rushing Attempts.1
Passing Yards.1
Passing Attempts.1
Passing Completions.1
Penalties.1
Penalty Yards.1
Fumbles Lost.1
Interceptions Thrown.1
1st Downs.1
3rd Down Attempts.1
3rd Down Conversions.1
4th Down Attempts.1
4th Down conversions.1
Time of Possession.1
Score.1


In [233]:
for col_name in nfl_data.columns:
    print(col_name.strip())
    
#Here we use the function strip, this is already does the job
#usually these functions are quicker

Date
Vis Team
Rushing Yards
Rushing Attempts
Passing Yards
Passing Attempts
Passing Completions
Penalties
Penalty Yards
Fumbles Lost
Interceptions Thrown
1st Downs
3rd Down Attempts
3rd Down Conversions
4th Down Attempts
4th Down conversions
Time of Possession
Score
Home Team
Rushing Yards.1
Rushing Attempts.1
Passing Yards.1
Passing Attempts.1
Passing Completions.1
Penalties.1
Penalty Yards.1
Fumbles Lost.1
Interceptions Thrown.1
1st Downs.1
3rd Down Attempts.1
3rd Down Conversions.1
4th Down Attempts.1
4th Down conversions.1
Time of Possession.1
Score.1


In [234]:
#Here we make a list and then don't add any data
list_new_names = []

for col_name in nfl_data.columns: 
    list_new_names.append(col_name.strip())
list_new_names    

['Date',
 'Vis Team',
 'Rushing Yards',
 'Rushing Attempts',
 'Passing Yards',
 'Passing Attempts',
 'Passing Completions',
 'Penalties',
 'Penalty Yards',
 'Fumbles Lost',
 'Interceptions Thrown',
 '1st Downs',
 '3rd Down Attempts',
 '3rd Down Conversions',
 '4th Down Attempts',
 '4th Down conversions',
 'Time of Possession',
 'Score',
 'Home Team',
 'Rushing Yards.1',
 'Rushing Attempts.1',
 'Passing Yards.1',
 'Passing Attempts.1',
 'Passing Completions.1',
 'Penalties.1',
 'Penalty Yards.1',
 'Fumbles Lost.1',
 'Interceptions Thrown.1',
 '1st Downs.1',
 '3rd Down Attempts.1',
 '3rd Down Conversions.1',
 '4th Down Attempts.1',
 '4th Down conversions.1',
 'Time of Possession.1',
 'Score.1']

In [235]:
# Perfect now we just assign this list to the columns

nfl_data.columns = list_new_names

### Now scroll up and rerun the two graphs
### Do you understand why one graph works and one doesn't?

Now lets save this new dataframe as it is now clean

In [236]:
nfl_data.to_csv('/home/sherlock/workspace/data/cleaned_nfl_data.csv',index=False)

### yep easy!

# Nope lets not stop yet.

Now lets do some very simple data wrangling

In [237]:
nfl_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256 entries, 1 to 256
Data columns (total 35 columns):
Date                      256 non-null object
Vis Team                  256 non-null object
Rushing Yards             256 non-null object
Rushing Attempts          256 non-null object
Passing Yards             256 non-null object
Passing Attempts          256 non-null object
Passing Completions       256 non-null object
Penalties                 256 non-null object
Penalty Yards             256 non-null object
Fumbles Lost              256 non-null object
Interceptions Thrown      256 non-null object
1st Downs                 256 non-null object
3rd Down Attempts         256 non-null object
3rd Down Conversions      256 non-null object
4th Down Attempts         256 non-null object
4th Down conversions      256 non-null object
Time of Possession        256 non-null object
Score                     256 non-null object
Home Team                 256 non-null object
Rushing Yards.1      

### First how can we check how many teams are in the NFL?

Well we know they all must play at home, so we could print them all

In [238]:
nfl_data['Home Team']

1         Patriots
2          Raiders
3             Rams
4        Cardinals
5         Chargers
6            Bears
7           Texans
8             Jets
9       Buccaneers
10         Jaguars
11           Bills
12         Broncos
13        Redskins
14         Cowboys
15         Falcons
16           49ers
17          Chiefs
18         Jaguars
19          Saints
20        Steelers
21           Bills
22        Panthers
23        Redskins
24         Vikings
25         Bengals
26          Browns
27          Eagles
28         Raiders
29          Giants
30           Bears
          ...     
227     Buccaneers
228         Chiefs
229          Bills
230         Saints
231           Jets
232          Lions
233         Titans
234        Falcons
235         Ravens
236         Eagles
237       Dolphins
238       Seahawks
239        Vikings
240        Broncos
241        Broncos
242        Falcons
243         Texans
244          49ers
245         Giants
246        Cowboys
247         Chiefs
248         

But what we want is a just each appearance once. 

This is a set!

In [239]:
set(nfl_data['Home Team'])

{' 49ers',
 ' Bears',
 ' Bengals',
 ' Bills',
 ' Broncos',
 ' Browns',
 ' Buccaneers',
 ' Cardinals',
 ' Chargers',
 ' Chiefs',
 ' Colts',
 ' Cowboys',
 ' Dolphins',
 ' Eagles',
 ' Falcons',
 ' Giants',
 ' Jaguars',
 ' Jets',
 ' Lions',
 ' Packers',
 ' Panthers',
 ' Patriots',
 ' Raiders',
 ' Rams',
 ' Ravens',
 ' Redskins',
 ' Saints',
 ' Seahawks',
 ' Steelers',
 ' Texans',
 ' Titans',
 ' Vikings'}

Now we could manually count!

or we could use len()

In [240]:
len(set(nfl_data['Home Team']))

32

Well now we might want to look at how the teams did at home

pandas has a great command for this called groupby, it is slightly complex but for now we will tell it to grouby the home team and then sum the columns (mean is also possible)

In [241]:
nfl_data.groupby('Home Team').sum()

Unnamed: 0_level_0,Date,Vis Team,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,...,Penalty Yards.1,Fumbles Lost.1,Interceptions Thrown.1,1st Downs.1,3rd Down Attempts.1,3rd Down Conversions.1,4th Down Attempts.1,4th Down conversions.1,Time of Possession.1,Score.1
Home Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
49ers,2015-09-152015-10-042015-10-182015-10-232015-1...,Vikings Packers Ravens Seahawks Falcons Cardi...,71 162 77 176 17 70 68 133,17 33 22 41 14 29 36 33,177 200 343 212 285 267 174 231,32 32 53 24 45 40 21 38,23 22 33 18 30 24 15 22,5 8 3 6 6 7 6 11,25 65 15 50 63 47 45 105,0 0 0 0 0 0 1 0,...,57 30 25 33 30 81 98 60,1 0 0 0 0 1 1 0,0 1 0 0 2 1 3 1,25 8 15 8 18 17 17 21,12 13 13 11 14 9 14 18,5 4 5 1 6 0 2 8,0 1 0 0 2 2 2 1,0 0 0 0 2 1 2 0,1982 1406 1613 1315 1672 1353 1741 1983,20 3 25 3 17 13 14 19
Bears,2015-09-132015-09-202015-10-042015-11-012015-1...,Packers Cardinals Raiders Vikings Broncos 49e...,133 115 70 147 170 121 99 67,30 28 22 25 36 23 33 21,189 185 173 180 219 170 275 282,23 24 33 30 27 32 31 39,18 17 20 17 20 18 24 28,10 8 5 4 8 6 8 4,74 58 47 35 118 50 63 21,0 1 1 0 0 0 0 0,...,64 170 48 38 0 72 79 15,0 0 2 0 1 0 1 0,1 2 1 0 1 1 0 3,25 18 23 18 19 20 20 16,17 12 17 12 12 17 11 12,11 2 10 3 4 8 5 7,3 1 1 1 1 0 0 0,2 0 1 1 0 0 0 0,1912 1796 2002 1962 1607 2244 1538 1650,23 23 22 20 15 20 21 20
Bengals,2015-09-202015-10-042015-10-112015-11-062015-1...,Chargers Chiefs Seahawks Browns Texans Rams S...,131 113 200 69 82 94 84 59,25 23 30 17 25 17 28 21,223 348 197 144 174 251 270 282,27 45 23 33 33 53 39 56,21 31 15 15 17 36 30 30,8 7 10 4 5 7 5 4,64 46 112 28 54 45 82 38,2 1 0 0 0 0 0 0,...,75 84 50 20 70 35 27 72,2 0 1 0 1 0 0 0,0 0 1 0 1 1 3 0,20 18 27 23 16 19 22 17,10 10 15 14 14 10 8 9,4 6 6 8 4 3 3 0,0 0 0 1 1 1 0 1,0 0 0 1 0 0 0 1,1791 1387 2449 2163 1886 1807 1453 1847,24 36 27 31 6 31 20 24
Bills,2015-09-132015-09-202015-10-042015-10-182015-1...,Colts Patriots Giants Bengals Dolphins Texans...,64 56 92 112 106 126 121 122,17 15 28 28 23 26 25 17,240 451 211 243 291 275 186 178,49 59 35 33 37 43 31 37,26 38 20 22 28 26 13 16,5 11 11 3 8 10 8 6,49 119 85 20 62 89 65 41,1 2 0 0 1 0 1 0,...,113 140 135 93 94 42 62 45,0 0 1 0 0 0 1 1,0 3 1 1 0 0 1 0,15 23 14 22 18 15 19 18,13 13 16 13 13 16 14 20,5 4 3 5 8 8 6 9,0 1 3 2 0 0 1 2,0 1 1 1 0 0 0 2,1856 1803 1767 1800 1873 1687 1890 2353,27 32 10 21 33 30 16 22
Broncos,2015-09-132015-10-042015-11-022015-11-152015-1...,Ravens Vikings Packers Chiefs Patriots Raider...,73 113 90 106 39 27 108 110,23 21 21 32 16 23 33 29,100 212 50 197 262 99 186 207,32 41 22 31 42 29 35 35,18 27 14 17 23 12 22 21,3 9 2 11 5 10 6 7,15 63 15 102 47 70 45 45,0 1 0 0 1 0 1 0,...,45 40 93 55 46 27 40 35,0 0 0 0 0 2 1 3,1 2 1 5 1 0 0 2,16 18 24 18 23 20 22 21,18 9 12 14 16 18 11 11,8 2 6 3 4 4 4 5,0 1 0 3 1 3 0 0,0 1 0 2 1 1 0 0,2237 1566 2007 1593 2191 2158 1670 1528,19 23 29 13 30 12 20 27
Browns,2015-09-202015-09-272015-10-182015-11-012015-1...,Titans Raiders Broncos Cardinals Ravens Benga...,166 155 152 119 104 144 71 30,30 30 33 38 23 33 17 19,219 314 290 372 232 233 150 349,37 32 48 38 34 22 28 36,21 20 26 23 20 16 18 24,9 12 8 5 7 5 5 9,85 85 81 25 50 55 40 45,3 1 0 3 0 0 0 1,...,40 50 30 35 15 84 71 47,0 1 1 1 0 1 1 2,0 1 2 1 0 1 1 2,13 21 20 16 21 18 28 22,11 16 15 16 15 13 12 16,4 8 6 9 4 6 6 4,1 1 1 1 2 2 0 4,0 1 0 0 2 0 0 1,1489 1651 2000 1537 1830 1742 2259 2037,28 20 23 20 27 3 24 12
Buccaneers,2015-09-132015-10-042015-10-112015-11-082015-1...,Titans Panthers Jaguars Giants Cowboys Falcon...,124 133 55 114 42 64 87 174,32 33 17 33 21 18 35 39,185 111 270 213 174 255 301 153,15 22 33 40 29 45 41 27,13 11 23 26 19 30 31 20,8 5 3 6 6 7 12 3,55 45 12 49 65 58 95 25,1 1 1 0 0 0 0 0,...,97 48 72 79 62 53 80 47,0 1 0 3 0 1 0 2,2 4 0 0 2 1 0 1,16 25 20 19 23 25 18 19,14 16 11 13 12 11 11 8,3 8 5 6 4 6 4 2,4 2 0 0 0 0 1 0,2 2 0 0 0 0 1 0,1709 1928 2081 1505 1796 1875 1366 1377,14 23 38 18 10 23 17 21
Cardinals,2015-09-132015-09-272015-10-042015-10-272015-1...,Saints 49ers Rams Ravens Bengals Vikings Pack...,54 103 164 55 99 72 101 145,20 29 26 16 28 24 26 37,354 53 164 221 278 317 77 209,48 19 24 40 39 36 29 32,30 9 16 26 22 25 16 22,7 6 7 9 10 3 7 6,73 45 66 64 108 25 59 70,0 0 0 1 1 3 3 0,...,30 24 20 40 40 30 119 43,1 0 2 0 0 0 1 0,0 1 1 0 2 0 1 3,25 28 26 21 21 22 19 16,10 13 11 12 11 13 10 12,5 6 2 5 5 6 5 5,0 1 2 0 0 1 0 1,0 1 1 0 0 1 0 0,1596 2192 1993 1776 1725 1919 1816 1403,31 47 22 26 34 23 38 6
Chargers,2015-09-132015-10-042015-10-132015-10-252015-1...,Lions Browns Steelers Raiders Bears Chiefs Br...,69 100 155 130 109 153 134 44,16 21 27 26 29 31 39 19,233 332 194 282 337 232 159 187,30 41 26 31 40 25 26 34,19 32 13 24 27 20 16 20,4 12 8 14 6 5 3 9,29 91 86 136 55 35 18 72,0 1 0 0 1 0 0 1,...,40 31 54 86 54 39 72 48,1 0 1 0 1 1 2 0,2 0 1 2 0 1 1 2,28 23 24 28 19 14 15 26,11 11 14 16 13 13 14 14,6 3 4 8 7 4 5 9,0 0 0 0 1 1 2 0,0 0 0 0 0 0 0 0,2292 1541 2023 1751 1585 1592 1623 2333,33 30 20 29 19 3 3 30
Chiefs,2015-09-182015-10-112015-10-252015-11-012015-1...,Broncos Bears Steelers Lions Bills Chargers B...,61 87 147 81 129 44 232 48,22 26 24 14 28 18 36 16,238 241 192 195 286 236 136 157,45 45 29 38 38 43 32 33,26 26 16 23 21 24 13 21,8 6 1 8 9 7 8 7,85 49 5 81 91 48 83 59,0 1 1 0 2 0 0 0,...,60 49 66 20 58 53 33 55,3 0 0 0 0 1 0 0,2 0 0 0 0 1 1 2,20 16 18 24 21 18 16 23,7 14 16 13 13 11 12 13,0 5 9 8 6 6 4 6,0 0 0 0 0 0 1 2,0 0 0 0 0 0 1 1,1739 1583 1946 2054 1768 1732 1736 2095,24 17 23 45 30 10 17 23


Once again balls!

In [242]:
#to access within a pandas column this is the command
nfl_data['Rushing Yards'].iloc[0]

' 134'

# that is annoying!

You might wonder why did our graphs work then!
The answer to that is that the simple commands actually called on complex commands that have built in routines to deal with this. 

There are multiple ways to deal with this error, but actually lets think about what is happening.

The numbers are being turned into words becuase when we first imported the file each column had a word as its first letter.

So actually it is better just to reimport the file skipping the first entry!

In [243]:
nfl_data = pd.read_csv('/home/sherlock/workspace/data/2015_stats.csv',skiprows=1)

In [244]:
nfl_data.groupby('Home Team').sum()

Unnamed: 0_level_0,Rushing Yards,Rushing Attempts,Passing Yards,Passing Attempts,Passing Completions,Penalties,Penalty Yards,Fumbles Lost,Interceptions Thrown,1st Downs,...,Penalty Yards.1,Fumbles Lost.1,Interceptions Thrown.1,1st Downs.1,3rd Down Attempts.1,3rd Down Conversions.1,4th Down Attempts.1,4th Down conversions.1,Time of Possession.1,Score.1
Home Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
49ers,774,225,1889,285,187,52,415,1,5,159,...,414,3,8,129,104,31,8,5,13065,114
Bears,922,218,1673,239,162,53,466,2,4,155,...,486,4,9,159,110,50,7,4,14711,164
Bengals,832,186,1889,309,195,50,469,3,9,151,...,433,4,6,162,90,34,4,2,14783,199
Bills,799,179,2075,324,189,62,530,5,8,169,...,724,3,6,144,118,48,9,5,15029,191
Broncos,666,198,1313,267,154,53,402,3,3,118,...,381,6,12,162,109,36,8,5,14950,173
Browns,941,223,2159,275,168,60,466,8,8,153,...,372,7,8,159,114,47,12,4,14545,157
Buccaneers,793,228,1662,252,173,50,404,3,5,162,...,538,7,10,165,96,38,7,5,13637,164
Cardinals,793,206,1673,267,166,55,510,8,7,144,...,346,4,8,178,92,39,5,3,14420,227
Chargers,894,208,1956,253,171,61,522,3,5,148,...,424,6,9,177,106,46,4,0,14740,167
Chiefs,829,184,1681,303,170,54,501,4,8,151,...,394,4,6,156,99,44,3,2,14653,189


# Simple! 

##### Now lets note by doing df[1:] we screwed ourselves later!

##### Also our 'clean data' is now dirty!

Also this whole .1 is odd to and will lead to mistakes.

In [245]:
list_new_names = []

for col_name in nfl_data.columns: 
    new_name = col_name.strip()
    
    #the in command is useful, as means what it looks like.
    if '.1' in new_name:
        #yep you can interact with words or strings like a list
        #yep you can add to strings
        new_name = new_name[:-2]+'_H'
    else:
        if new_name not in ['Home Team','Vis Team','Date']:
            new_name = new_name +'_V'
        
    list_new_names.append(new_name)

nfl_data.columns = list_new_names

In [5]:
nfl_data.to_csv('/home/sherlock/workspace/data/ben/cleaned_nfl_data.csv',index=False)

##### Now we can look at the data

In [247]:
Home_data = nfl_data.groupby('Home Team').sum()

In [248]:
trace_1 = go.Histogram(
            x=Home_data['Score_H'],
            name = 'Score',
)


data = [trace_1]

layout = go.Layout(
    title = 'Total score' 
    
)


fig = go.Figure(data=data, layout=layout)


off.iplot(fig)

### But we kind of want average score!

In [249]:
Home_data['Average score'] = Home_data['Total score']/ ??

SyntaxError: invalid syntax (<ipython-input-249-4e90e9aada43>, line 1)

In [250]:
nfl_data['Count'] = 1

In [251]:
Home_data = nfl_data.groupby('Home Team').sum()

In [252]:
Home_data['Average Home score'] = Home_data['Score_H']/ Home_data['Count']

In [253]:
Home_data['Average Away score'] = Home_data['Score_V']/ Home_data['Count']

In [254]:
trace_1 = go.Histogram(
            x=Home_data['Average Home score'],
            name = 'Home Score',
)

trace_2 = go.Histogram(
            x=Home_data['Average Away score'],
            name = 'Away Score',
)



data = [trace_1,trace_2]

layout = go.Layout(
    title = 'Total score' 
    
)


fig = go.Figure(data=data, layout=layout)


off.iplot(fig)

#### But thats a crap graph! 

How do we draw it properly!

In [255]:
trace_1 = go.Bar(
            x=Home_data.index,
            y=Home_data['Average Home score'],
            name = 'Home Score',
)

trace_2 = go.Bar(
            x=Home_data.index,
            y=Home_data['Average Away score'],
            name = 'Away Score',
)



data = [trace_1,trace_2]

layout = go.Layout(
    title = 'Total score' 
    
)


fig = go.Figure(data=data, layout=layout)


off.iplot(fig)

Now we are starting to cook!

In [256]:
trace_1 = go.Scatter(
            x=Home_data['Average Home score'],
            y=Home_data['Average Away score'],
            name = 'Score',
            text=Home_data.index,
            mode = 'markers'
)




data = [trace_1,trace_2]

layout = go.Layout(
    title = 'Comparing the average score of a home team against away teams', 
    xaxis = dict(
        title= 'Home Score',
        range = [8,35]
    ),
    yaxis=dict(
        title= 'Away Score',
        range = [8,35]
    )
    
)


fig = go.Figure(data=data, layout=layout)


off.iplot(fig)

## What else can we do

Pandas is very useful for selecting things

In [261]:

nfl_data['Vis Team']== ' Steelers'

0       True
1      False
2      False
3      False
4      False
5      False
6      False
7      False
8      False
9      False
10     False
11     False
12     False
13     False
14     False
15     False
16     False
17     False
18     False
19     False
20     False
21     False
22     False
23     False
24     False
25     False
26     False
27     False
28     False
29     False
       ...  
226    False
227    False
228    False
229    False
230    False
231    False
232    False
233    False
234     True
235    False
236    False
237    False
238    False
239    False
240    False
241    False
242    False
243    False
244    False
245    False
246    False
247    False
248     True
249    False
250    False
251    False
252    False
253    False
254    False
255    False
Name: Vis Team, dtype: bool

In [262]:
nfl_data[nfl_data['Vis Team']== ' Steelers']

Unnamed: 0,Date,Vis Team,Rushing Yards_V,Rushing Attempts_V,Passing Yards_V,Passing Attempts_V,Passing Completions_V,Penalties_V,Penalty Yards_V,Fumbles Lost_V,...,Fumbles Lost_H,Interceptions Thrown_H,1st Downs_H,3rd Down Attempts_H,3rd Down Conversions_H,4th Down Attempts_H,4th Down conversions_H,Time of Possession_H,Score_H,Count
0,2015-09-11,Steelers,134,25,330,38,26,8,77,0,...,0,0,26,11,7,0,0,1675,28,1
45,2015-09-27,Steelers,62,22,197,30,25,4,63,0,...,0,1,12,10,2,2,0,1661,6,1
76,2015-10-13,Steelers,155,27,194,26,13,8,86,0,...,1,1,24,14,4,0,0,2023,20,1
100,2015-10-25,Steelers,147,24,192,29,16,1,5,1,...,0,0,18,16,9,0,0,1946,23,1
169,2015-11-29,Steelers,58,14,480,59,37,6,65,0,...,0,0,21,13,7,0,0,1680,39,1
204,2015-12-13,Steelers,84,28,270,39,30,5,82,0,...,0,3,22,8,3,0,0,1453,20,1
234,2015-12-27,Steelers,110,20,198,34,24,3,25,1,...,0,0,22,18,9,0,0,2060,20,1
248,2016-01-03,Steelers,30,19,349,36,24,9,45,1,...,2,2,22,16,4,4,1,2037,12,1


In [265]:
nfl_data[(nfl_data['Vis Team']== ' Steelers')
         |(nfl_data['Home Team']== ' Steelers')]

Unnamed: 0,Date,Vis Team,Rushing Yards_V,Rushing Attempts_V,Passing Yards_V,Passing Attempts_V,Passing Completions_V,Penalties_V,Penalty Yards_V,Fumbles Lost_V,...,Fumbles Lost_H,Interceptions Thrown_H,1st Downs_H,3rd Down Attempts_H,3rd Down Conversions_H,4th Down Attempts_H,4th Down conversions_H,Time of Possession_H,Score_H,Count
0,2015-09-11,Steelers,134,25,330,38,26,8,77,0,...,0,0,26,11,7,0,0,1675,28,1
19,2015-09-20,49ers,111,31,298,46,33,7,46,1,...,0,0,21,10,6,0,0,1381,43,1
45,2015-09-27,Steelers,62,22,197,30,25,4,63,0,...,0,1,12,10,2,2,0,1661,6,1
48,2015-10-02,Ravens,191,39,165,34,21,4,30,1,...,0,0,17,13,2,2,0,1929,20,1
76,2015-10-13,Steelers,155,27,194,26,13,8,86,0,...,1,1,24,14,4,0,0,2023,20,1
78,2015-10-18,Cardinals,55,20,414,45,29,9,111,1,...,0,0,14,12,3,0,0,1731,25,1
100,2015-10-25,Steelers,147,24,192,29,16,1,5,1,...,0,0,18,16,9,0,0,1946,23,1
107,2015-11-01,Bengals,78,23,218,38,23,10,94,0,...,0,3,21,11,3,0,0,1809,10,1
120,2015-11-08,Raiders,139,25,301,44,24,3,21,3,...,1,1,27,17,6,2,1,1924,38,1
136,2015-11-15,Browns,15,14,327,45,33,12,188,2,...,1,1,22,10,3,1,0,1637,30,1
