# Introduction to Python and Jupyter Notebooks Review

To begin, be sure you understand how to move between cells in a Jupyter notebook and change them from code to markdown.  If you want additional work with styling markdown cells, please see the [cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).  In this part of the notebook, we will review some numpy basics and create some simple plots with Matplotlib.

In [1]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/T8JGn4JRy4g?ecver=1" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

In [2]:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

### NumPy and Matplotlib

To begin, let's play with some basic `matplotlib` plots and the NumPy random methods. For more information please consult the documentation [here](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html). 

In [5]:
a = np.random.randint(1, 20, 100)

In [6]:
plt.figure()
plt.hist(a)

<IPython.core.display.Javascript object>

(array([ 9., 12., 11., 10.,  3.,  9.,  8., 14., 15.,  9.]),
 array([ 1. ,  2.8,  4.6,  6.4,  8.2, 10. , 11.8, 13.6, 15.4, 17.2, 19. ]),
 <a list of 10 Patch objects>)

In [8]:
b = np.random.random(100)
c = np.random.normal(5, 10, 100)
d = np.random.binomial(100, .3, 100)

In [9]:
plt.hist(b)

(array([11., 12., 12.,  5., 11.,  9., 19.,  8.,  7.,  6.]),
 array([0.00260586, 0.09985051, 0.19709517, 0.29433983, 0.39158448,
        0.48882914, 0.5860738 , 0.68331845, 0.78056311, 0.87780777,
        0.97505242]),
 <a list of 10 Patch objects>)

In [None]:
# NOTES - intro to interactive plotting
# this second histogram is placed on top of the first figure 
# only in notebook, you have to preface using plt.figure() command 

In [6]:
np.random.binomial?

In [7]:
a[:5]

array([ 3,  5, 13, 13,  8])

In [8]:
plt.figure(figsize = (9, 6))

plt.subplot(2, 2, 1)
plt.hist(a)
plt.title("Random Integers")

plt.subplot(2, 2, 2)
plt.hist(b, color = 'green')
plt.title("Random Floats")

plt.subplot(2, 2, 3)
plt.hist(c, color = 'grey')
plt.title("Normal Distribution")

plt.subplot(2, 2, 4)
plt.hist(d, color = 'orange')
plt.title("Binomial Distribution")

<IPython.core.display.Javascript object>

Text(0.5,1,'Binomial Distribution')

In [9]:
plt.figure()
plt.scatter(c, d)
plt.title("Scatter Plot", loc = 'left')
plt.xticks([])
plt.yticks([])

<IPython.core.display.Javascript object>

([], <a list of 0 Text yticklabel objects>)

In [10]:
dists = [a, b, c, d]
plt.figure()
plt.boxplot(dists)
plt.title("Boxplots of Distributions", loc = "right")

<IPython.core.display.Javascript object>

Text(1,1,'Boxplots of Distributions')

In [11]:
import seaborn as sns

plt.figure()
for i in [a,c,d]:
    sns.distplot(i, hist = False)

<IPython.core.display.Javascript object>

### Loading Data: Intro to Pandas

Now, we use the Pandas library to examine a variety of datasets.  Below, I create four different `DataFrame` objects from files.  The first three are from `.csv` files located in our **data** directory.  The final, is through the API from NYCOpenData.  We will continue to visit methods of accessing and structuring data, but to begin we use these two popular options.  

To load the `.csv` files, we provide Pandas with a path or url in the `.read_csv()` method.  I load all four datasets in what follows.

In [12]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/9Dsg9DQAU_g?ecver=1" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

In [10]:
nyc311data = pd.read_json('https://data.cityofnewyork.us/resource/fhrw-4uyv.json')

In [11]:
nyc311data.columns

Index(['address_type', 'agency', 'agency_name', 'bbl', 'borough', 'city',
       'closed_date', 'community_board', 'complaint_type', 'created_date',
       'cross_street_1', 'cross_street_2', 'descriptor', 'due_date',
       'facility_type', 'incident_address', 'incident_zip',
       'intersection_street_1', 'intersection_street_2', 'latitude',
       'location', 'location_type', 'longitude', 'open_data_channel_type',
       'park_borough', 'park_facility_name', 'resolution_action_updated_date',
       'resolution_description', 'status', 'street_name',
       'taxi_pick_up_location', 'unique_key', 'x_coordinate_state_plane',
       'y_coordinate_state_plane'],
      dtype='object')

In [15]:
nyc311data.dtypes

address_type                       object
agency                             object
agency_name                        object
bbl                               float64
borough                            object
city                               object
closed_date                        object
community_board                    object
complaint_type                     object
created_date                       object
cross_street_1                     object
cross_street_2                     object
descriptor                         object
due_date                           object
facility_type                      object
incident_address                   object
incident_zip                      float64
intersection_street_1              object
intersection_street_2              object
landmark                           object
latitude                          float64
location                           object
location_type                      object
longitude                         

In [12]:
nyc311data.describe()

Unnamed: 0,bbl,incident_zip,latitude,longitude,unique_key,x_coordinate_state_plane,y_coordinate_state_plane
count,788.0,994.0,992.0,992.0,1000.0,992.0,992.0
mean,2821326000.0,10871.207243,40.733111,-73.914738,39546780.0,1007873.0,206390.224798
std,1204321000.0,546.764302,0.082205,0.079548,2405.281,22059.89,29947.421808
min,1000330000.0,10001.0,40.511559,-74.242685,39542680.0,916768.0,125745.0
25%,2028605000.0,10453.0,40.675169,-73.956432,39544620.0,996336.0,185261.0
50%,3026485000.0,11209.5,40.719661,-73.92055,39546700.0,1006240.0,201471.0
75%,4016903000.0,11364.75,40.801983,-73.866021,39548840.0,1021327.0,231463.25
max,5073550000.0,11694.0,40.907142,-73.729944,39550840.0,1059177.0,269790.0


In [14]:
complaints = nyc311data[['complaint_type', 'borough', 'agency', 'agency_name']]

In [None]:
# grabbing info from these 4 columns specifically
# appears complaint types are standardized, not individualized
# allows us to look for patterns, which complaints are most common

In [18]:
complaints.head()

Unnamed: 0,complaint_type,borough,agency,agency_name
0,Request Large Bulky Item Collection,BROOKLYN,DSNY,Department of Sanitation
1,Street Condition,QUEENS,DOT,Department of Transportation
2,Noise - Street/Sidewalk,MANHATTAN,NYPD,New York City Police Department
3,Noise - Residential,QUEENS,NYPD,New York City Police Department
4,Noise - Commercial,BROOKLYN,NYPD,New York City Police Department


In [19]:
complaints.groupby(by = 'borough').size()

borough
BRONX            199
BROOKLYN         261
MANHATTAN        247
QUEENS           263
STATEN ISLAND     28
Unspecified        2
dtype: int64

In [None]:
# above code looks for complaints by boro
# and the .size asks for the size of complaints
# can use pd.df to turn series into dataframe

In [15]:
c = complaints.groupby(by = 'borough').size()

In [16]:
type(c)

pandas.core.series.Series

In [17]:
c.index

Index(['BRONX', 'BROOKLYN', 'MANHATTAN', 'QUEENS', 'STATEN ISLAND',
       'Unspecified'],
      dtype='object', name='borough')

In [20]:
complaints[complaints['borough'] =='BROOKLYN'].sort_values('complaint_type')[:10]

Unnamed: 0,complaint_type,borough,agency,agency_name
133,Animal in a Park,BROOKLYN,DPR,Department of Parks and Recreation
908,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department
272,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department
191,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department
282,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department
318,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department
170,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department
329,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department
248,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department
112,Blocked Driveway,BROOKLYN,NYPD,New York City Police Department


In [19]:
complaints.iloc[133]

complaint_type                Noise - Residential
borough                                  BROOKLYN
agency                                       NYPD
agency_name       New York City Police Department
Name: 133, dtype: object

In [21]:
BK_COMPLAIN = complaints[complaints['borough'] == 'BROOKLYN']['complaint_type'].value_counts()

In [22]:
plt.figure(figsize = (7, 5))
plt.bar(BK_COMPLAIN.index[:6], BK_COMPLAIN[:6])

<IPython.core.display.Javascript object>

<BarContainer object of 6 artists>

In [None]:
# code below is used to rotate names on the x-axis

In [23]:
plt.tick_params(labelrotation = 20)

In [25]:
plt.rcParams

RcParams({'_internal.classic_mode': False,
          'agg.path.chunksize': 0,
          'animation.avconv_args': [],
          'animation.avconv_path': 'avconv',
          'animation.bitrate': -1,
          'animation.codec': 'h264',
          'animation.convert_args': [],
          'animation.convert_path': 'convert',
          'animation.embed_limit': 20.0,
          'animation.ffmpeg_args': [],
          'animation.ffmpeg_path': 'ffmpeg',
          'animation.frame_format': 'png',
          'animation.html': 'none',
          'animation.html_args': [],
          'animation.writer': 'ffmpeg',
          'axes.autolimit_mode': 'data',
          'axes.axisbelow': 'line',
          'axes.edgecolor': 'k',
          'axes.facecolor': 'w',
          'axes.formatter.limits': [-7, 7],
          'axes.formatter.min_exponent': 0,
          'axes.formatter.offset_threshold': 4,
          'axes.formatter.use_locale': False,
          'axes.formatter.use_mathtext': False,
          'axes.formatter

In [None]:
#plots in the exploratory phase are different from plots you would use to present to someone

In [None]:
#name to know Edward Tufty - all about data visualization
# think about what each element brings to the graph, what is the add value
# no extra ink being used
# be careful of color, need to remember color-blindness

In [24]:
plt.rcParams["font.family"] = "fantasy"

plt.figure(figsize = (10, 7))
bars = plt.barh(BK_COMPLAIN.index[:5], BK_COMPLAIN[:5])
plt.title("Top 5 311 Complaints in Brooklyn", loc = 'left', fontsize = 16 )

<IPython.core.display.Javascript object>

Text(0,1,'Top 5 311 Complaints in Brooklyn')

In [26]:
labels = BK_COMPLAIN.index

In [27]:
for i in labels[:6]:
    print(i)

Noise - Residential
Noise - Street/Sidewalk
Noise - Commercial
Blocked Driveway
Illegal Parking
Noise - Vehicle


In [None]:
# looping through first 6 labels in line above

In [27]:
for i in range(5):
    label = labels[i]
    plt.gca().text(2, i, label, color = 'w', fontsize = 10)

In [None]:
# plt.gca allows you to add text to certain coordinates

In [28]:
plt.tick_params(top = 'off', bottom = 'off', left = 'off', right = 'off', labelleft='off', labelbottom='off')

In [None]:
# line above turns the frame off
# there are 4 different frame parameters (top, right, etc)

In [29]:
for spine in plt.gca().spines.values():
    spine.set_visible(False)

In [None]:
# in line above removing the box around the plot

In [30]:
plt.savefig('images/brooklyn_complaining.png')

In [None]:
# To save the figure use the line above

In [None]:
# can you label the bars with the values?
# important note seaborn is built on top of matplotlib


In [30]:
nums = BK_COMPLAIN

In [31]:
nums[0]

176

In [None]:
for i in range(5):
    label = labels[i]
    plt.gca().text(30, i, num, color = 'w', fontsize = 10)

In [None]:
# line above is how to include values in the bars as labels
# highlight important things and leave everything else off
# changing feature in the parameters changes it for all plots in the notebook

### Titanic Manipulation

In [31]:
titanic = pd.read_csv('data/eda_data/titanic.csv')
titanic.head()

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [32]:
titanic[titanic.pclass == 3][:5]

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S


In [33]:
titanic.sample(frac=0.1)[:5]

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
429,1,3,"Pickard, Mr. Berk (Berk Trembisky)",male,32.0,0,0,SOTON/O.Q. 392078,8.05,E10,S
219,0,2,"Harris, Mr. Walter",male,30.0,0,0,W/C 14208,10.5,,S
567,0,3,"Palsson, Mrs. Nils (Alma Cornelia Berglund)",female,29.0,0,4,349909,21.075,,S
571,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53.0,2,0,11769,51.4792,C101,S
333,0,3,"Vander Planke, Mr. Leo Edmondus",male,16.0,2,0,345764,18.0,,S


In [34]:
titanic.iloc[4:10]

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S
8,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C


In [35]:
titanic.nlargest(10, 'age')

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
630,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0,A23,S
851,0,3,"Svensson, Mr. Johan",male,74.0,0,0,347060,7.775,,S
96,0,1,"Goldschmidt, Mr. George B",male,71.0,0,0,PC 17754,34.6542,A5,C
493,0,1,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C
116,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q
672,0,2,"Mitchell, Mr. Henry Michael",male,70.0,0,0,C.A. 24580,10.5,,S
745,0,1,"Crosby, Capt. Edward Gifford",male,70.0,1,1,WE/P 5735,71.0,B22,S
33,0,2,"Wheadon, Mr. Edward H",male,66.0,0,0,C.A. 24579,10.5,,S
54,0,1,"Ostby, Mr. Engelhart Cornelius",male,65.0,0,1,113509,61.9792,B30,C
280,0,3,"Duane, Mr. Frank",male,65.0,0,0,336439,7.75,,Q


In [36]:
titanic.nsmallest(10, 'age')

Unnamed: 0,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
803,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C
755,1,2,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S
469,1,3,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C
644,1,3,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C
78,1,2,"Caldwell, Master. Alden Gates",male,0.83,0,2,248738,29.0,,S
831,1,2,"Richards, Master. George Sibley",male,0.83,1,1,29106,18.75,,S
305,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S
164,0,3,"Panula, Master. Eino Viljami",male,1.0,4,1,3101295,39.6875,,S
172,1,3,"Johnson, Miss. Eleanor Ileen",female,1.0,1,1,347742,11.1333,,S
183,1,2,"Becker, Master. Richard F",male,1.0,2,1,230136,39.0,F4,S


In [37]:
gender = titanic[['survived', 'sex']]

In [38]:
gender[gender['survived'] == 0].groupby('sex').size()

sex
female     81
male      468
dtype: int64

In [62]:
gender.iloc[gender, ('survived' == 0)] 

TypeError: '>=' not supported between instances of 'int' and 'str'

In [39]:
gender[gender['survived'] == 1].groupby('sex').size()

sex
female    233
male      109
dtype: int64

### Rock Songs

In [32]:
rockin = pd.read_csv('data/eda_data/rocking.csv', index_col = 0)

In [34]:
rockin.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2230 entries, 0 to 2229
Data columns (total 8 columns):
Song Clean      2230 non-null object
ARTIST CLEAN    2230 non-null object
Release Year    1653 non-null object
COMBINED        2230 non-null object
First?          2230 non-null int64
Year?           2230 non-null int64
PlayCount       2230 non-null int64
F*G             2230 non-null int64
dtypes: int64(4), object(4)
memory usage: 156.8+ KB


In [35]:
rockin.head()

Unnamed: 0,Song Clean,ARTIST CLEAN,Release Year,COMBINED,First?,Year?,PlayCount,F*G
0,Caught Up in You,.38 Special,1982.0,Caught Up in You by .38 Special,1,1,82,82
1,Fantasy Girl,.38 Special,,Fantasy Girl by .38 Special,1,0,3,0
2,Hold On Loosely,.38 Special,1981.0,Hold On Loosely by .38 Special,1,1,85,85
3,Rockin' Into the Night,.38 Special,1980.0,Rockin' Into the Night by .38 Special,1,1,18,18
4,Art For Arts Sake,10cc,1975.0,Art For Arts Sake by 10cc,1,1,1,1


In [56]:
#rockin = rockin.rename({'First?': 'First', 'Year?': 'Year', 'F*G': 'fg'}, axis = 1)

In [77]:
rockin = rockin.rename({'Song Clean':'Song', 'ARTIST CLEAN':'Artist', 'Release Year':'year_released'})

In [78]:
#null_release_mask = rockin['Release Year'].isnull()
#rockin.loc[null_release_mask, 'Release Year'] = 0

In [79]:
rockin.index

Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            2220, 2221, 2222, 2223, 2224, 2225, 2226, 2227, 2228, 2229],
           dtype='int64', length=2230)

In [80]:
rockin.describe()

Unnamed: 0,First?,Year?,PlayCount,F*G
count,2230.0,2230.0,2230.0,2230.0
mean,1.0,0.741256,16.872646,15.04843
std,0.0,0.438043,25.302972,25.288366
min,1.0,0.0,0.0,0.0
25%,1.0,0.0,1.0,0.0
50%,1.0,1.0,4.0,3.0
75%,1.0,1.0,21.0,18.0
max,1.0,1.0,142.0,142.0


In [54]:
rockin.dtypes

Song Clean      object
ARTIST CLEAN    object
Release Year    object
COMBINED        object
First?           int64
Year?            int64
PlayCount        int64
F*G              int64
dtype: object

In [81]:
rockin.columns

Index(['Song Clean', 'ARTIST CLEAN', 'Release Year', 'COMBINED', 'First?',
       'Year?', 'PlayCount', 'F*G'],
      dtype='object')

In [50]:
rockin.sort_values(by='ARTIST CLEAN')

Unnamed: 0,Song Clean,ARTIST CLEAN,Release Year,COMBINED,First?,Year?,PlayCount,F*G
0,Caught Up in You,.38 Special,1982,Caught Up in You by .38 Special,1,1,82,82
1,Fantasy Girl,.38 Special,,Fantasy Girl by .38 Special,1,0,3,0
2,Hold On Loosely,.38 Special,1981,Hold On Loosely by .38 Special,1,1,85,85
3,Rockin' Into the Night,.38 Special,1980,Rockin' Into the Night by .38 Special,1,1,18,18
4,Art For Arts Sake,10cc,1975,Art For Arts Sake by 10cc,1,1,1,1
5,Kryptonite,3 Doors Down,2000,Kryptonite by 3 Doors Down,1,1,13,13
6,Loser,3 Doors Down,2000,Loser by 3 Doors Down,1,1,1,1
7,When I'm Gone,3 Doors Down,2002,When I'm Gone by 3 Doors Down,1,1,6,6
8,What's Up?,4 Non Blondes,1992,What's Up? by 4 Non Blondes,1,1,3,3
26,Moneytalks,AC/DC,,Moneytalks by AC/DC,1,0,20,0


In [57]:
rockin.sort_values(by='ARTIST CLEAN').size

17840

In [59]:
rockin.groupby(by = 'ARTIST CLEAN').size().sort_values(ascending= False)

ARTIST CLEAN
The Beatles                      100
Led Zeppelin                      69
Rolling Stones                    55
Van Halen                         44
Pink Floyd                        39
The Who                           31
Aerosmith                         31
AC/DC                             29
Tom Petty & The Heartbreakers     29
Bob Seger                         24
Fleetwood Mac                     24
Heart                             24
Bruce Springsteen                 23
Paul McCartney & Wings            23
Elton John                        22
ZZ Top                            22
Eric Clapton                      21
The Doors                         21
Eagles                            20
Rush                              20
Metallica                         20
Billy Joel                        19
David Bowie                       19
Queen                             19
Ozzy Osbourne                     19
U2                                19
Creedence Clearwater Revi

In [64]:
rockin[['ARTIST CLEAN', 'PlayCount']].groupby('ARTIST CLEAN').sum()

Unnamed: 0_level_0,PlayCount
ARTIST CLEAN,Unnamed: 1_level_1
.38 Special,188
10cc,1
3 Doors Down,20
4 Non Blondes,3
AC/DC,866
Ace,1
Adelitas Way,4
Aerosmith,813
Alanis Morissette,7
Alannah Myles,1


In [68]:
rockin.loc[:,['Release Year']]

Unnamed: 0,Release Year
0,1982
1,
2,1981
3,1980
4,1975
5,2000
6,2000
7,2002
8,1992
9,1985


In [115]:
release = rockin.loc[:,['Release Year', 'COMBINED', 'Song Clean']]

In [None]:
# trying to answer song most played in the 80s

In [116]:
release.columns

Index(['Release Year', 'COMBINED', 'Song Clean'], dtype='object')

In [121]:
release.head()

Unnamed: 0,Release Year,COMBINED,Song Clean
0,1982.0,Caught Up in You by .38 Special,Caught Up in You
1,,Fantasy Girl by .38 Special,Fantasy Girl
2,1981.0,Hold On Loosely by .38 Special,Hold On Loosely
3,1980.0,Rockin' Into the Night by .38 Special,Rockin' Into the Night
4,1975.0,Art For Arts Sake by 10cc,Art For Arts Sake


In [137]:
eights = [str(i) for i in range (1980,1990,1)]

rockin[rockin['Release Year'].isin(eights)]

Unnamed: 0,Song Clean,ARTIST CLEAN,Release Year,COMBINED,First?,Year?,PlayCount,F*G,Relase Year
0,Caught Up in You,.38 Special,1982,Caught Up in You by .38 Special,1,1,82,82,1982.0
2,Hold On Loosely,.38 Special,1981,Hold On Loosely by .38 Special,1,1,85,85,1981.0
3,Rockin' Into the Night,.38 Special,1980,Rockin' Into the Night by .38 Special,1,1,18,18,1980.0
9,Take On Me,a-ha,1985,Take On Me by a-ha,1,1,1,1,1985.0
11,Back In Black,AC/DC,1980,Back In Black by AC/DC,1,1,97,97,1980.0
15,For Those About To Rock,AC/DC,1981,For Those About To Rock by AC/DC,1,1,46,46,1981.0
18,Have a Drink On Me,AC/DC,1980,Have a Drink On Me by AC/DC,1,1,39,39,1980.0
19,Hells Bells,AC/DC,1980,Hells Bells by AC/DC,1,1,74,74,1980.0
22,Jailbreak,AC/DC,1984,Jailbreak by AC/DC,1,1,1,1,1984.0
28,Rock and Roll Ain't Noise Pollution,AC/DC,1980,Rock and Roll Ain't Noise Pollution by AC/DC,1,1,21,21,1980.0


In [59]:
#rockin['ARTIST CLEAN'].unique()[::10]

In [124]:
rockin['Release Year'].value_counts()

1973             104
1977              83
1975              83
1970              81
1971              75
1969              72
1980              70
1978              64
1979              63
1981              61
1967              61
1983              60
1976              56
1982              54
1984              51
1972              50
1974              48
1968              46
1987              39
1985              39
1986              37
1991              34
1989              32
1966              30
1988              29
1965              28
1994              25
1990              22
1993              19
1964              14
1992              14
1999              13
1995              10
1996               9
1997               9
1963               9
1998               6
2002               6
2012               5
2005               5
2004               5
2001               4
2008               3
2007               3
1962               3
2000               3
2003               3
2011         

In [127]:
rockin['Relase Year'] = pd.to_numeric(rockin['Release Year'], errors = 'coerce')

In [133]:
rockin.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2230 entries, 0 to 2229
Data columns (total 9 columns):
Song Clean      2230 non-null object
ARTIST CLEAN    2230 non-null object
Release Year    1653 non-null object
COMBINED        2230 non-null object
First?          2230 non-null int64
Year?           2230 non-null int64
PlayCount       2230 non-null int64
F*G             2230 non-null int64
Relase Year     1652 non-null float64
dtypes: float64(1), int64(4), object(4)
memory usage: 174.2+ KB


In [118]:
release.sort_values('Release Year')

Unnamed: 0,Release Year,COMBINED,Song Clean
547,1071,Levon by Elton John,Levon
148,1955,Rock Around the Clock by Bill Haley,Rock Around the Clock
341,1958,JOHNNY B. GOODE by Chuck Berry,JOHNNY B. GOODE
1759,1961,Cry For A Shadow by The Beatles,Cry For A Shadow
1804,1962,P.s. I Love You by The Beatles,P.s. I Love You
1795,1962,Love Me Do by The Beatles,Love Me Do
258,1962,Green Onions by Booker T. and the MG's,Green Onions
1818,1963,She Loves You by The Beatles,She Loves You
1783,1963,I Want To Hold Your Hand by The Beatles,I Want To Hold Your Hand
1782,1963,I Saw Her Standing There by The Beatles,I Saw Her Standing There
