# Welcome to the Dark Art of Coding:
## Introduction to Python
pandas: Series & DataFrames

<img src='../images/dark_art_logo.600px.png' width='300' style="float:right">

# Objectives
---

In this session, students should expect to:

* understand the purpose and application of pandas to data analysis problems
* understand how to create and use a Series
* understand how to create and use a DataFrame
* explore various simple examples of pandas usage


# `pandas` basics
---

`pandas` is one of the premier data analysis libraries in the Python ecosystem. It offers high-performance, easy-to-use data structures and data analysis tools enabling you to carry out your entire data analysis workflow.

`pandas` is used for:

* data analysis/science
* financial analysis
* data manipulation
* data cleansing
* data transformation

`pandas` has tools to read and write data to and from multiple data formats.

It also includes tools that simplify:

* grouping data
* applying transformations to columns, rows and individual cells
* working with dates and times

# List vs. Dict vs. Series vs. DataFrame
---

## list 
```python
LIST:                               
mylist = ['A', 'B', 'C']            
```

**indexable**: 

`mylist[0]` by integer            

**sliceable**: 

`mylist[0:2]` by integer
 

## dict
```
DICTIONARY:
mydict = {'alpha': 1,
          'beta': 2,
          'gamma': 2}
```

**indexable**: 

`mydict['alpha']` by key


## Series
```
myseries = Series(['bruce', 'selina', 'kara', 'clark])
          column
rows
0         'bruce'
1         'selina'
2         'kara'
'three'   'clark'
```

**indexable**: 

`myseries[0]` by integer

`myseries['three']` by row name

**sliceable**:

`myseries[0:3]`                  

## DataFrame:
```
mydataframe = DataFrame(lots of data...)
        col1      col2        col3      age
rows
0       'bruce'   'wayne'     'M'       42
1       'selina'  'kyle'      'F'       34
'two'   'kara'    'zor-el'    'F'       27
3       'clark'   'kent'      'M'       35
```

**indexable**

by either row(s) or column(s) 

`mydataframe['col1']`

`mydataframe[['col1', 'age']]`



# Series
---

In [2]:
# Let's start by making a simple Series.
# It is customary to import pandas by the alias: pd

import pandas as pd
from pandas import Series

s = Series([33, 37, 27, 42])

# pandas will assign an index automatically starting at "0"

s

0    33
1    37
2    27
3    42
dtype: int64

In [2]:
# We can see that the object is a Series object

print(type(s))

<class 'pandas.core.series.Series'>


In [4]:
# Series objects can be assigned a name 
# The index can also be assigned directly.

s.name = 'Justice League ages'
s.index = ['bruce', 'selina', 'kara', 'clark']

s

bruce     33
selina    37
kara      27
clark     42
Name: Justice League ages, dtype: int64

In [5]:
s['bruce']

33

In [11]:
# The Series factory function allows you to assign attributes
#     such as the index directly.

s1 = Series([37, 36, 10, 36],
            index=['hal', 'victor', 'diana', 'billy'],
            name='More Justice League ages')
s1

# Generally, any ordered iterable can be used to produce the inputs
#     to a Series. We used a list here.
#     range(), generators, arrays, etc

billy     37
victor    36
diana     10
billy     36
Name: More Justice League ages, dtype: int64

In [7]:
# Accessing a row directly uses brackets and the 
#     name of the row.

s1['billy']

36

In [8]:
# Accessing multiple rows uses the names of 
#     the rows embedded in a list

s1[   ['billy', 'victor', 'hal']     ]

billy     36
victor    36
hal       37
Name: More Justice League ages, dtype: int64

In [9]:
# Accessing multiple rows may also use slice notation.
#     slice notation can often be used with both integer indexes and
#     string indexes.
#      

s1['hal':'diana']

hal       37
victor    36
diana     10
Name: More Justice League ages, dtype: int64

In [10]:
# slice notation using integers still works even if we have 
#     applied a string-based index.

s1[0:3]

hal       37
victor    36
diana     10
Name: More Justice League ages, dtype: int64

In [12]:
# Similarly, assignment of a value to a row
# uses bracket indexing

s1['diana'] = 32
s1

billy     37
victor    36
diana     32
billy     36
Name: More Justice League ages, dtype: int64

In [16]:
# Rows can be filtered using comparison operators
#     such as ==, <=, >=

s1[s1 == 32]

s1 > 35

billy      True
victor     True
diana     False
billy      True
Name: More Justice League ages, dtype: bool

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_series_01.py```

Create a pandas Series called `restaurant_ratings` according to the following guidelines:

* starting at the top, each row should contain one number from 1 to 5 (inclusive)
* give the series a name called ratings
* give the series an index with the names of five restaurants

Execute your script in the **IPython interpreter** using the command:

```bash
run my_series_01.py```


From the IPython shell, explore your Series object by performing our typical explorations:
* `type()`
* `.<tab complete>`

Also look at the attributes you have added: 

* `.name`
* `.index`

Lastly extract particular records from your Series:

* Choose a record from the Series using the name of one of your restaurants
* Choose three records from the Series using a list of names of restaurants



When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_series_02.py```

Create a pandas Series called `bacteria_lengths` according to the following guidelines:

* starting at the top, each row should contain one number from 1 to 5000 (inclusive) incrementing by 100
* give the series a name called length
* Do not worry about an index for this series. Simply use the default indexing that Series provide.

Execute your script from the **terminal/command line** using the command:

```bash
ipython -i my_series_02.py```


From the IPython Interpreter, explore your Series object by performing our typical explorations:
* `type()`
* `.<tab complete>`

Also look at the attributes you have added or that were created by default: 

* `.name`
* `.index`

Lastly extract particular records from your Series:

* Choose a record from the Series at index 23
* Choose three records from the Series at index 23-27

When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

In [17]:
# Much like numpy, pandas Series (and DataFrames)
#     offer vector mathematics whereby you can add to
#     or multiply against all rows or cells
#     WITHOUT using a for loop.

s1 * 2

billy     74
victor    72
diana     64
billy     72
Name: More Justice League ages, dtype: int64

In [18]:
s1

billy     37
victor    36
diana     32
billy     36
Name: More Justice League ages, dtype: int64

In [19]:
s1[['diana', 'billy']] * 20

diana    640
billy    740
billy    720
Name: More Justice League ages, dtype: int64

In [20]:
s1

billy     37
victor    36
diana     32
billy     36
Name: More Justice League ages, dtype: int64

In [None]:
'diana' in s1

In [24]:
42 in [1, 2, 3, 4]

False

In [22]:
'lex' in s1

False

In [25]:
names = {'bruce wayne': 'bwayne@jleague.org',
         'hal jordan': 'hjordan@jleague.org',
         'clark kent': 'ckent@jleague.org',
         'barry allen': 'ballen@jleague.org',
         'diana prince': 'dprince@jleague.org',
         'arthur curry': 'acurry@jleague.org',
         'billy batson': 'bbatson@jleague.org',
         'john jones': 'jjones@jleague.org',
         'victor stone': 'vstone@jleague.org',
         'dick grayson': 'dgrayson@jleague.org',
         'ray palmer': 'rpalmer@jleague.org',
         'dinah lance': 'dlance@jleague.org',
         'kara zor-el': 'kzor-el@jleague.org',
         'john constantine': 'jconstantine@jleague.org',
         'barbara gordon': 'bgordon@jleague.org',
         'kyle rayner': 'krayner@jleague.org',
         'selina kyle': 'skyle@jleague.org',
         'wally west': 'wwest@jleague.org'
         }

emails = Series(names)
# emails.index
# emails.values

In [26]:
emails

arthur curry              acurry@jleague.org
barbara gordon           bgordon@jleague.org
barry allen               ballen@jleague.org
billy batson             bbatson@jleague.org
bruce wayne               bwayne@jleague.org
clark kent                 ckent@jleague.org
diana prince             dprince@jleague.org
dick grayson            dgrayson@jleague.org
dinah lance               dlance@jleague.org
hal jordan               hjordan@jleague.org
john constantine    jconstantine@jleague.org
john jones                jjones@jleague.org
kara zor-el              kzor-el@jleague.org
kyle rayner              krayner@jleague.org
ray palmer               rpalmer@jleague.org
selina kyle                skyle@jleague.org
victor stone              vstone@jleague.org
wally west                 wwest@jleague.org
dtype: object

In [28]:
emails[['barry allen', 'hal jordan', 'john jones']]

barry allen     ballen@jleague.org
hal jordan     hjordan@jleague.org
john jones      jjones@jleague.org
dtype: object

# Analyzing data
---

In [31]:
s1 = Series(range(10, 16), index=['a', 'b', 'c', 'd', 'e', 'f'])
s2 = Series(range(16, 22), index=['a', 'b', 'c', 'x', 'y', 'z'])

# s1
# s2

In [32]:
s2

a    16
b    17
c    18
x    19
y    20
z    21
dtype: int64

In [33]:
s1

a    10
b    11
c    12
d    13
e    14
f    15
dtype: int64

In [37]:
s3 = s1 + s2
s1 + s2

# type(s3)
# pd.isnull(s3)
# s3.isnull()
# s3.<tab>

a    26.0
b    28.0
c    30.0
d     NaN
e     NaN
f     NaN
x     NaN
y     NaN
z     NaN
dtype: float64

In [38]:
s3.isnull()

a    False
b    False
c    False
d     True
e     True
f     True
x     True
y     True
z     True
dtype: bool

In [39]:
# How do I learn more?
# s3.<method_name>?        # just ask by typing the method name (sans parenthesis) and 
#                          # adding a question mark to see the builtin help docs
# 
# s3.value_counts?
# s3.value_counts(dropna=False)

In [40]:
s3.value_counts?

In [42]:
s3.dropna?
# s3

In [43]:
s4 = Series([42, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 6])

# s4.unique()
# s4.value_counts()
# s4.max()
# s4 + 2

In [46]:
s4.unique()

array([42,  1,  2,  3,  4,  5,  6])

In [47]:
def transmogrifier(x):
    '''hat tip to Calvin and Hobbes for introducing me to this 
    truly fantastic word. thanks, bill watterson.

    "transform, especially in a surprising or magical manner."
    '''
    new_val = '- ' + str(x ** 3) + ' -'
    return new_val

s4.apply(transmogrifier)


0     - 74088 -
1         - 1 -
2         - 1 -
3         - 1 -
4         - 8 -
5         - 8 -
6        - 27 -
7        - 27 -
8        - 27 -
9        - 27 -
10       - 27 -
11       - 27 -
12       - 64 -
13      - 125 -
14      - 216 -
dtype: object

# DataFrames
---

In [6]:
from pandas import DataFrame

In [7]:
# Making a DataFrame # 1
# Using a dictionary:

data = {'hero': ['billy', 'billy', 'billy', 'selina', 'selina'],
        'date': ['Jan 10', 'Jan 11', 'Jan 12', 'Jan 10', 'Jan 11'],
        'emails': [111, 121, 93, 211, 210]}

df = DataFrame(data)
df

Unnamed: 0,date,emails,hero
0,Jan 10,111,billy
1,Jan 11,121,billy
2,Jan 12,93,billy
3,Jan 10,211,selina
4,Jan 11,210,selina


In [8]:
df = DataFrame(data, columns=['date', 'hero', 'emails'])
df

Unnamed: 0,date,hero,emails
0,Jan 10,billy,111
1,Jan 11,billy,121
2,Jan 12,billy,93
3,Jan 10,selina,211
4,Jan 11,selina,210


In [9]:
df = DataFrame(data, columns=['date', 'hero', 'emails', 'instagrams'])

# df.index = [1, 2, 3, 4, 5]

# df
# df.columns

In [10]:
df['instagrams'] = 42

In [11]:
df

Unnamed: 0,date,hero,emails,instagrams
0,Jan 10,billy,111,42
1,Jan 11,billy,121,42
2,Jan 12,billy,93,42
3,Jan 10,selina,211,42
4,Jan 11,selina,210,42


In [12]:
df[['date', 'emails'    ] ]

Unnamed: 0,date,emails
0,Jan 10,111
1,Jan 11,121
2,Jan 12,93
3,Jan 10,211
4,Jan 11,210


In [13]:
# df['hero']
# df.hero

# df.loc[3]
df.loc[0:4:2]
# df.ix[3:5]
# df.ix[1:5:2]

# Deprecation warning... use df.loc and df.iloc instead

Unnamed: 0,date,hero,emails,instagrams
0,Jan 10,billy,111,42
2,Jan 12,billy,93,42
4,Jan 11,selina,210,42


In [14]:
from pandas import Series


df.instagrams = 50

# df['instragram']



ins = Series([10, 20, 30], index=[1, 3, 5])
ins

1    10
3    20
5    30
dtype: int64

In [15]:
df.instagrams

0    50
1    50
2    50
3    50
4    50
Name: instagrams, dtype: int64

In [16]:
df.instagrams = ins
df

Unnamed: 0,date,hero,emails,instagrams
0,Jan 10,billy,111,
1,Jan 11,billy,121,10.0
2,Jan 12,billy,93,
3,Jan 10,selina,211,20.0
4,Jan 11,selina,210,


In [17]:
df


Unnamed: 0,date,hero,emails,instagrams
0,Jan 10,billy,111,
1,Jan 11,billy,121,10.0
2,Jan 12,billy,93,
3,Jan 10,selina,211,20.0
4,Jan 11,selina,210,


In [18]:
# If you want to add a new column, dataframes are completely
#     mutable: columns can be added at will.

df['overworked'] = df['emails'] >= 120
df

Unnamed: 0,date,hero,emails,instagrams,overworked
0,Jan 10,billy,111,,False
1,Jan 11,billy,121,10.0,True
2,Jan 12,billy,93,,False
3,Jan 10,selina,211,20.0,True
4,Jan 11,selina,210,,True


In [19]:
df[    df.overworked == False       ]

Unnamed: 0,date,hero,emails,instagrams,overworked
0,Jan 10,billy,111,,False
2,Jan 12,billy,93,,False


In [20]:
# If you want to add a new column, dataframes are completely
#     mutable: columns can be added at will.

standalone_series = df['emails'] >= 120
standalone_series

0    False
1     True
2    False
3     True
4     True
Name: emails, dtype: bool

In [27]:
standalone_series = df['instagrams'] != 10.0
standalone_series

0     True
1    False
2     True
3     True
4     True
Name: instagrams, dtype: bool

In [29]:
df[df.date != 'Jan 10']

Unnamed: 0,date,hero,emails,instagrams,overworked
1,Jan 11,billy,121,10.0,True
2,Jan 12,billy,93,,False
4,Jan 11,selina,210,,True


In [30]:
# Making a DataFrame # 2
# Using a dictionary with nested dictionaries...

data = {'billy': {'Jan 10': 202, 'Jan 11': 220, 'Jan 12': 198},
        'selina': {'Jan 09': 246, 'Jan 10': 235, 'Jan 11': 243}}

In [31]:
df2 = DataFrame(data)
df2

Unnamed: 0,billy,selina
Jan 09,,246.0
Jan 10,202.0,235.0
Jan 11,220.0,243.0
Jan 12,198.0,


In [32]:
# df2.T
dft = df2.T
dft

Unnamed: 0,Jan 09,Jan 10,Jan 11,Jan 12
billy,,202.0,220.0,198.0
selina,246.0,235.0,243.0,


In [34]:
dft.columns.name = 'date'
dft.index.name = 'hero'

In [35]:
dft

date,Jan 09,Jan 10,Jan 11,Jan 12
hero,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
billy,,202.0,220.0,198.0
selina,246.0,235.0,243.0,


In [36]:
# using indexes
nums = Series(range(10, 16),
              index=['t', 'u', 'v', 'x', 'y', 'z'])
nums

t    10
u    11
v    12
x    13
y    14
z    15
dtype: int64

In [37]:
i = nums.index
print(type(i), i)

<class 'pandas.indexes.base.Index'> Index(['t', 'u', 'v', 'x', 'y', 'z'], dtype='object')


In [41]:
i[::3]

# i[2:4]
# i[::2]
# i[::3]
# i[4]

Index(['t', 'x'], dtype='object')

In [45]:
logs = pd.read_csv('../15/log_file_1000.csv', names=['name',
                                                     'email',
                                                     'fm_ip',
                                                     'to_ip',
                                                     'date_time',
                                                     'lat',
                                                     'long',
                                                     'payload_size'])

In [46]:
logs

Unnamed: 0,name,email,fm_ip,to_ip,date_time,lat,long,payload_size
0,barry allen,ballen@jleague.org,155.130.121.215,75.122.133.241,2016-02-08T21:44:41,49.83160,8.01485,764272
1,arthur curry,acurry@jleague.org,106.152.115.161,106.152.114.248,2016-02-08T21:45:37,45.10327,11.68293,249206
2,john jones,jjones@jleague.org,60.15.193.250,155.130.121.215,2016-02-08T21:46:53,47.11673,10.35874,856820
3,wally west,wwest@jleague.org,190.214.22.201,190.214.22.116,2016-02-07T21:47:12,46.75616,11.47886,593774
4,arthur curry,acurry@jleague.org,60.15.193.74,60.15.193.95,2016-02-07T21:48:04,48.59134,12.30683,171910
5,ray palmer,rpalmer@jleague.org,106.152.114.248,102.86.56.203,2016-02-07T21:48:54,45.23082,10.90642,300389
6,victor stone,vstone@jleague.org,60.15.193.95,155.130.121.215,2016-02-07T21:49:02,48.20129,10.54183,916813
7,kyle rayner,krayner@jleague.org,220.211.18.48,106.152.115.49,2016-02-07T21:49:26,48.85730,9.23887,421284
8,clark kent,ckent@jleague.org,106.152.114.248,190.214.22.201,2016-02-06T21:50:26,48.05990,9.97774,458476
9,barbara gordon,bgordon@jleague.org,60.15.193.74,106.152.114.248,2016-02-06T21:50:50,46.43865,8.50877,752245


In [47]:
pd.read_csv?

In [None]:
logs.fm_ip.unique()

In [None]:
logs.name.value_counts()

In [54]:
logs.name.tail(13)

987        dick grayson
988        dick grayson
989        arthur curry
990        arthur curry
991      barbara gordon
992    john constantine
993        arthur curry
994         selina kyle
995          clark kent
996        billy batson
997         bruce wayne
998          ray palmer
999         bruce wayne
Name: name, dtype: object

In [55]:
g = logs.groupby(logs.fm_ip)
type(g)


pandas.core.groupby.DataFrameGroupBy

In [56]:
g.ngroups

30

In [58]:
g.first()

Unnamed: 0_level_0,name,email,to_ip,date_time,lat,long,payload_size
fm_ip,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
102.86.56.199,diana prince,dprince@jleague.org,60.15.193.74,2016-02-05T21:56:07,47.64781,11.90056,622470
102.86.56.203,hal jordan,hjordan@jleague.org,102.86.56.213,2016-01-31T22:15:49,46.56542,11.1041,481866
102.86.56.213,hal jordan,hjordan@jleague.org,102.86.56.213,2016-01-31T22:13:56,45.45445,11.0413,625106
102.86.56.243,bruce wayne,bwayne@jleague.org,106.152.115.49,2016-01-31T22:15:03,45.67383,9.921,768775
106.152.114.248,ray palmer,rpalmer@jleague.org,102.86.56.203,2016-02-07T21:48:54,45.23082,10.90642,300389
106.152.114.9,barry allen,ballen@jleague.org,102.86.56.243,2016-02-05T21:54:59,48.03074,10.41633,87758
106.152.115.130,dick grayson,dgrayson@jleague.org,220.211.18.31,2016-01-29T22:26:49,48.17151,10.57354,426194
106.152.115.161,arthur curry,acurry@jleague.org,106.152.114.248,2016-02-08T21:45:37,45.10327,11.68293,249206
106.152.115.49,barbara gordon,bgordon@jleague.org,220.211.18.12,2016-01-05T00:54:23,46.25667,12.25382,671268
155.130.120.114,diana prince,dprince@jleague.org,106.152.114.9,2016-02-03T22:03:55,49.89778,8.25535,174355


In [None]:
for item in g:
    print(item)

In [61]:
type(g.get_group('106.152.115.161'))

pandas.core.frame.DataFrame

In [60]:
g.get_group('106.152.115.161').head(10)

Unnamed: 0,date_time,email,lat,long,name,payload_size,to_ip
1,2016-02-08T21:45:37,acurry@jleague.org,45.10327,11.68293,arthur curry,249206,106.152.114.248
124,2016-01-18T23:26:50,jconstantine@jleague.org,46.51786,7.5144,john constantine,197413,106.152.115.49
137,2016-01-17T23:39:15,jjones@jleague.org,47.59412,11.93051,john jones,141543,75.122.132.124
149,2016-01-14T23:48:04,bbatson@jleague.org,48.42538,10.99307,billy batson,395075,75.122.133.10
168,2016-01-12T00:05:46,ckent@jleague.org,47.15144,8.52435,clark kent,924131,106.152.115.130
173,2016-01-12T00:09:58,ballen@jleague.org,45.83844,8.21253,barry allen,621909,60.15.193.249
188,2016-01-11T00:22:31,acurry@jleague.org,48.3262,9.66367,arthur curry,15034,75.122.133.75
201,2016-01-08T00:36:08,kzor-el@jleague.org,49.26437,12.09269,kara zor-el,760026,75.122.133.241
210,2016-01-06T00:43:44,bbatson@jleague.org,47.02258,7.75461,billy batson,389068,190.214.22.94
259,2015-12-31T01:24:57,bbatson@jleague.org,48.9545,9.75673,billy batson,503714,75.122.132.124


In [63]:
def date_only(dt):
    day = dt.split('T')[0]
    return day

In [64]:
logs['date'] = logs.date_time.apply(date_only)

In [None]:
logs.columns

In [66]:
logs.date

0      2016-02-08
1      2016-02-08
2      2016-02-08
3      2016-02-07
4      2016-02-07
5      2016-02-07
6      2016-02-07
7      2016-02-07
8      2016-02-06
9      2016-02-06
10     2016-02-05
11     2016-02-05
12     2016-02-05
13     2016-02-05
14     2016-02-05
15     2016-02-05
16     2016-02-05
17     2016-02-05
18     2016-02-05
19     2016-02-04
20     2016-02-04
21     2016-02-03
22     2016-02-03
23     2016-02-03
24     2016-02-02
25     2016-02-02
26     2016-02-02
27     2016-02-02
28     2016-02-01
29     2016-02-01
          ...    
970    2015-09-09
971    2015-09-09
972    2015-09-09
973    2015-09-08
974    2015-09-08
975    2015-09-08
976    2015-09-08
977    2015-09-08
978    2015-09-08
979    2015-09-08
980    2015-09-08
981    2015-09-08
982    2015-09-07
983    2015-09-07
984    2015-09-07
985    2015-09-07
986    2015-09-07
987    2015-09-07
988    2015-09-07
989    2015-09-06
990    2015-09-06
991    2015-09-06
992    2015-09-06
993    2015-09-06
994    201

In [None]:
tf = logs.fm_ip == logs.to_ip

In [68]:
tf

0      False
1      False
2      False
3      False
4      False
5      False
6      False
7      False
8      False
9      False
10     False
11      True
12     False
13     False
14     False
15     False
16     False
17     False
18     False
19     False
20     False
21     False
22     False
23     False
24     False
25     False
26     False
27     False
28     False
29     False
       ...  
970    False
971    False
972    False
973    False
974    False
975    False
976    False
977    False
978    False
979    False
980    False
981    False
982    False
983    False
984    False
985    False
986    False
987    False
988    False
989    False
990    False
991    False
992    False
993    False
994     True
995    False
996    False
997    False
998    False
999    False
dtype: bool

In [None]:
tf.unique()

In [70]:
tf.value_counts()

False    967
True      33
dtype: int64

In [71]:
logs[['fm_ip', 'to_ip']].head(12)

Unnamed: 0,fm_ip,to_ip
0,155.130.121.215,75.122.133.241
1,106.152.115.161,106.152.114.248
2,60.15.193.250,155.130.121.215
3,190.214.22.201,190.214.22.116
4,60.15.193.74,60.15.193.95
5,106.152.114.248,102.86.56.203
6,60.15.193.95,155.130.121.215
7,220.211.18.48,106.152.115.49
8,106.152.114.248,190.214.22.201
9,60.15.193.74,106.152.114.248


In [72]:
logs = pd.read_csv('../15/log_file_na.csv', names=['name',
                                                     'email',
                                                     'fm_ip',
                                                     'to_ip',
                                                     'date_time',
                                                     'lat',
                                                     'long',
                                                     'payload_size'])

In [73]:
logs

Unnamed: 0,name,email,fm_ip,to_ip,date_time,lat,long,payload_size
0,hal jordan,hjordan@jleague.org,134.218.213.165,103.11.11.80,2016-02-08T21:45:08,49.87458,,
1,wally west,wwest@jleague.org,134.218.213.202,146.157.216.74,2016-02-08T21:45:54,47.63482,,25668.0
2,dick grayson,dgrayson@jleague.org,50.133.32.218,146.157.216.247,2016-02-08T21:46:13,45.77999,8.59699,
3,dick grayson,dgrayson@jleague.org,50.133.32.231,146.157.216.74,2016-02-08T21:46:40,48.44356,12.08197,
4,wally west,wwest@jleague.org,240.237.148.218,8.79.168.234,2016-02-07T21:48:05,47.96539,10.39063,9999.0
5,ray palmer,rpalmer@jleague.org,103.11.11.6,146.157.216.247,2016-02-07T21:48:51,48.65536,10.57274,228285.0
6,dinah lance,dlance@jleague.org,134.218.213.176,8.79.168.167,2016-02-06T21:49:27,45.83758,11.38426,
7,ray palmer,rpalmer@jleague.org,103.11.11.6,103.11.11.14,2016-02-06T21:49:40,48.77532,9.23111,
8,arthur curry,acurry@jleague.org,134.218.213.136,146.157.216.209,2016-02-06T21:49:41,47.96156,9.77076,9999.0
9,ray palmer,rpalmer@jleague.org,146.157.216.247,134.218.213.184,2016-02-06T21:50:33,46.54799,9.86555,464457.0


# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_DataFrame_01.py
```

Follow these steps:

1. Create a pandas `DataFrame` called `na_log` by reading in the csv `log_file_na.csv` and using the names:
    
    ```['name', 'email', 'fm_ip', 'to_ip', 'date_time', 'lat', 'long', 'payload_size']```
1. Get the `payload_size` column and select it and label it as a separate pandas `Series`
1. Run both the `min()` method and the `max()` method on your `Series`
1. Calculate the difference between the two values you get back

Execute your script from the **terminal/command line** using the command:

```bash
ipython -i my_DataFrame_01.py
```

When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

# Experience Points!
---

In your **text editor** create a simple script called:

```bash
my_DataFrame_02.py
```

Then do the following

1. Create a pandas dataFrame called `na_log` by reading in the csv `log_file_na.csv` and using the names:
    1. `['name', 'email', 'fm_ip', 'to_ip', 'date_time', 'lat', 'long', 'payload_size']`
1. Print to the screen the content of the two columns: `long` and `lat`
1. Create a new Series that is made up of the **difference** between the value of `long` and the value of `lat`
1. Apply the `round()` function to the Series so that each value is rounded to the nearest full integer
1. Use the `.unique()` method to print out all of the unique values

Execute your script from the **terminal/command line** using the command:

```bash
ipython -i my_DataFrame_02.py
```

When you complete this exercise, please put your green post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../images/green_sticky.300px.png' width='200' style='float:left'>

In [None]:
(logs.lat - logs.long).apply(round).value_counts()