### Pandas Series

Pandas Series are like a column in Excel Sheet

### How to Create Series?

A Pandas Series can be created out of a Python list or NumPy array. It has to be remembered that unlike Python lists, a Series will always contain data of the same type. This makes NumPy array a better candidate for creating a pandas series

In [20]:
import pandas as pd

data = pd.Series([0.25, 0.5, 0.75, 1.0])
display(data)

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

Jupyter notebooks by default display dataframes in an easy-to read html table, whereas the print function displays an ugly, hard-to-follow, poorly formatted ascii tab-separated output. If you want to view the value of the last line of a notebook cell, then you don't even need display()

In [21]:
display(data.values) #Values are a numpy array 
display(data.index)  #The index is an array-like object of type pd.Index

array([0.25, 0.5 , 0.75, 1.  ])

RangeIndex(start=0, stop=4, step=1)

To have our own row index values while creating a Series. We just need to pass index parameters which take a list of the same type or a NumPy array

In [22]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],index=['a', 'b', 'c', 'd'])
display(data)

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

### Creating Data Frame

In [23]:
sales = {'account':['Jones LLC', 'Alpha Co', 'Blue INC'],
              'Jan':[105,110,130],
              'Feb':[210, 210, 90],
              'Mar':[180, 280, 95]
        }

We can create a Pandas DataFrame out of this dictionary as 

In [24]:
import pandas as pd
df = pd.DataFrame(sales)
display(df)

Unnamed: 0,account,Jan,Feb,Mar
0,Jones LLC,105,210,180
1,Alpha Co,110,210,280
2,Blue INC,130,90,95


There are chances that the Columns are not in sequence as defined in the dictionary because python implements dictionary as hash and doesn’t guarantee to preserve the sequence

### Reading csv Files

Download the movies.csv file from the below link, 

### Method 1 : save the file with the csv extension movies.csv

https://gist.githubusercontent.com/tiangechen/b68782efa49a16edaf07dc2cdaa855ea/raw/0c794a9717f18b094eabab2cd6a6b9a226903577/movies.csv

### Method 2 Using Curl Command

In [25]:
!curl https://gist.githubusercontent.com/tiangechen/b68782efa49a16edaf07dc2cdaa855ea/raw/0c794a9717f18b094eabab2cd6a6b9a226903577/movies.csv?accessType=DOWNLOAD > movies.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5073  100  5073    0     0   4175      0  0:00:01  0:00:01 --:--:--  4178


In [26]:
import warnings
# ignore the warnings in jupyter notebook
warnings.filterwarnings('ignore')
warnings.filterwarnings(action='once')

In [27]:
movies = pd.read_csv('movies.csv', sep=',', error_bad_lines=False)
display(movies.head())

Unnamed: 0,Film,Genre,Lead Studio,Audience score %,Profitability,Rotten Tomatoes %,Worldwide Gross,Year
0,Zack and Miri Make a Porno,Romance,The Weinstein Company,70,1.747542,64,$41.94,2008
1,Youth in Revolt,Comedy,The Weinstein Company,52,1.09,68,$19.62,2010
2,You Will Meet a Tall Dark Stranger,Comedy,Independent,35,1.211818,43,$26.66,2010
3,When in Rome,Comedy,Disney,44,0.0,15,$43.04,2010
4,What Happens in Vegas,Comedy,Fox,72,6.267647,28,$219.37,2008


The below code is to download the files from unsecured sites. This code is not required if the website secured with valid certificate

In [28]:
import os, ssl
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None)):
    ssl._create_default_https_context = ssl._create_unverified_context

### Reading Excel File

In [29]:
import numpy as np
import datetime

import pandas as pd
dataset_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00247/data_akbilgic.xlsx'

stock_data = pd.read_excel(dataset_path, header=1)
display(stock_data.head())

  for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():


Unnamed: 0,date,ISE,ISE.1,SP,DAX,FTSE,NIKKEI,BOVESPA,EU,EM
0,2009-01-05,0.035754,0.038376,-0.004679,0.002193,0.003894,0.0,0.03119,0.012698,0.028524
1,2009-01-06,0.025426,0.031813,0.007787,0.008455,0.012866,0.004162,0.01892,0.011341,0.008773
2,2009-01-07,-0.028862,-0.026353,-0.030469,-0.017833,-0.028735,0.017293,-0.035899,-0.017073,-0.020015
3,2009-01-08,-0.062208,-0.084716,0.003391,-0.011726,-0.000466,-0.040061,0.028283,-0.005561,-0.019424
4,2009-01-09,0.00986,0.009658,-0.021533,-0.019873,-0.01271,-0.004474,-0.009764,-0.010989,-0.007802


select only the required cols

In [30]:
import pandas as pd
dataset_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00247/data_akbilgic.xlsx'

stock_data = pd.read_excel(dataset_path, header=1, usecols=['date', 'ISE', 'SP'])
display(stock_data.head())

Unnamed: 0,date,ISE,SP
0,2009-01-05,0.035754,-0.004679
1,2009-01-06,0.025426,0.007787
2,2009-01-07,-0.028862,-0.030469
3,2009-01-08,-0.062208,0.003391
4,2009-01-09,0.00986,-0.021533


In [31]:
movies.isnull().sum()

Film                 0
Genre                0
Lead Studio          0
Audience score %     0
Profitability        0
Rotten Tomatoes %    0
Worldwide Gross      0
Year                 0
dtype: int64

### Selecting a single column

In [32]:
Film_Name = movies['Film']
display(Film_Name)

0             Zack and Miri Make a Porno
1                        Youth in Revolt
2     You Will Meet a Tall Dark Stranger
3                           When in Rome
4                  What Happens in Vegas
                     ...                
72                   Across the Universe
73                         A Serious Man
74                    A Dangerous Method
75                            27 Dresses
76                  (500) Days of Summer
Name: Film, Length: 77, dtype: object

You can perform the same task using the dot operator

In [33]:
Film_Name = movies.Film
display(Film_Name)

0             Zack and Miri Make a Porno
1                        Youth in Revolt
2     You Will Meet a Tall Dark Stranger
3                           When in Rome
4                  What Happens in Vegas
                     ...                
72                   Across the Universe
73                         A Serious Man
74                    A Dangerous Method
75                            27 Dresses
76                  (500) Days of Summer
Name: Film, Length: 77, dtype: object

### Selecting Multiple Columns

In [34]:
columns = ['Film', 'Genre', 'Lead Studio', 'Profitability', 'Year']
movies[columns].head()

Unnamed: 0,Film,Genre,Lead Studio,Profitability,Year
0,Zack and Miri Make a Porno,Romance,The Weinstein Company,1.747542,2008
1,Youth in Revolt,Comedy,The Weinstein Company,1.09,2010
2,You Will Meet a Tall Dark Stranger,Comedy,Independent,1.211818,2010
3,When in Rome,Comedy,Disney,0.0,2010
4,What Happens in Vegas,Comedy,Fox,6.267647,2008


### Selecting columns using "select_dtypes" and "filter" methods

To select columns using select_dtypes method, you should first find out the number of columns for each data types

In [35]:
movies.get_dtype_counts()

  if __name__ == '__main__':


float64    1
int64      3
object     4
dtype: int64

In [36]:
movies.select_dtypes(include=['int', 'float']).head()

Unnamed: 0,Audience score %,Profitability,Rotten Tomatoes %,Year
0,70,1.747542,64,2008
1,52,1.09,68,2010
2,35,1.211818,43,2010
3,44,0.0,15,2010
4,72,6.267647,28,2008


You can also use the filter method to select columns based on the column names or index labels.

In [37]:
movies.filter(like='score').head()

Unnamed: 0,Audience score %
0,70
1,52
2,35
3,44
4,72


In [38]:
movies.filter(like='score').head()

Unnamed: 0,Audience score %
0,70
1,52
2,35
3,44
4,72


### To change the order of the column

In [39]:
new_cols = ['Film', 'Genre', 'Year', 'Lead Studio', 'Audience score %', 'Rotten Tomatoes %', 'Profitability',
        'Worldwide Gross']

# to check if new_cols contains all the columns from the original
set(movies.columns) == set(new_cols)

True

In [40]:
movies2 = movies[new_cols]
movies2.head()

Unnamed: 0,Film,Genre,Year,Lead Studio,Audience score %,Rotten Tomatoes %,Profitability,Worldwide Gross
0,Zack and Miri Make a Porno,Romance,2008,The Weinstein Company,70,64,1.747542,$41.94
1,Youth in Revolt,Comedy,2010,The Weinstein Company,52,68,1.09,$19.62
2,You Will Meet a Tall Dark Stranger,Comedy,2010,Independent,35,43,1.211818,$26.66
3,When in Rome,Comedy,2010,Disney,44,15,0.0,$43.04
4,What Happens in Vegas,Comedy,2008,Fox,72,28,6.267647,$219.37


### Assignment - 1

In [41]:
# Create a New DataFrame auto_mobile with the following instruction
# Read the dataset using pandas from this link (http://archive.ics.uci.edu/ml/datasets/Auto+MPG)
# Add the column name to the datframe
# Read only the first 100 rows