<a href="https://colab.research.google.com/github/carlosfmorenog/CMM202/blob/master/CMM202_Topic_2/CMM202_T2_Lec.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CMM202 Topic 2: Loading Data & Manipulating Rows

## Lecture objectives

1) Understand how we define data (for the purpose of this module) and where to get it

2) Learn how to effectively import/export data into practical representations

3) Apply basic commands and concepts to explore it

## What is data (for the purpose of this course)?

Information or characteristics collected through observation or experimentation

Using a tabular approach...

Each entry/observation corresponds to a `row`

Each feature/characteristic corresponds to a `column`

## Where to get data from?

### "From a friend"


We are given access to a file or register containing the data

There are many formats, but mostly we will deal with:

    .csv
    .tsv
    .txt
    .xlsx
    .json

In [None]:
## Example: Reading a .csv file into python
## This file contains the age and height of some participants
import pandas as pd
df = pd.read_csv('https://www.dropbox.com/s/9aiiad9j6zxs07i/data.csv?raw=1')
df 

Unnamed: 0,Col 1,Col 2,Col 3
0,Nick,21,1.85
1,Chris,29,1.79
2,Tim,28,1.75
3,Ron,34,1.81
4,Monica,35,1.69
5,Cassandra,21,1.66


### Online repos

* There is a huge amount of free data out there!
* Just be careful where you get it!
* Websites such as [Kaggle](https://www.kaggle.com), [UCI](https://archive.ics.uci.edu/ml/index.php) or [Keel](https://sci2s.ugr.es/keel/datasets.php) contain thousands of examples that you can download and import to your preferred tool
* Sometimes they even contain ways to connect to your data in faster and more secure ways

![Fig. 2](https://www.dropbox.com/s/t2klwwig3qj2dh2/fig2.jpg?raw=1)

### Modules and packages

Languages such as Python and R already contain some preloaded data repositories

They are sometimes in weird shapes and forms

Nonetheless, they are good to start experimenting!

In [None]:
## Loading the IRIS dataset from the SCIKIT LEARN module
from sklearn.datasets import load_iris
iris = load_iris()
iris

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

Another popular way to obtain data is **web scraping** (we'll talk about that later in this course)

## Exploring Data by Rows

Now that we have data in a tabular form, let's see how to access certain positions

To do so, let's use a larger dataset of Netflix Original series contained in a .csv file called `netflix.csv`

In [None]:
netflix = pd.read_csv('https://www.dropbox.com/s/pwqaqftq2m9pgdv/netflix.csv?raw=1')
netflix

Unnamed: 0,Title,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
0,House of Cards,Political drama,"political,drama",1-Feb-13,"6 seasons, 73 episodes",6,73,42–59 min.,42,59,Ended,0,Drama,English
1,Hemlock Grove,Horror/thriller,"horror,thriller",19-Apr-13,"3 seasons, 33 episodes",3,33,45–58 min.,45,58,Ended,0,Drama,English
2,Orange Is the New Black,Comedy-drama,comedy-drama,11-Jul-13,"6 seasons, 78 episodes",6,78,50–92 min.,50,92,Renewed,1,Drama,English
3,Marco Polo,Historical drama,"historical,drama",12-Dec-14,"2 seasons, 20 episodes",2,20,48–65 min.,48,65,Ended,0,Drama,English
4,Bloodline,Thriller,thriller,20-Mar-15,"3 seasons, 33 episodes",3,33,48–68 min.,48,68,Ended,0,Drama,English
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
353,Busted!,Korean language variety show,"korean,language,variety,show",4-May-18,"1 season, 10 episodes",1,10,75–98 min.,75,98,Renewed,1,VarietyTalk,Korean
354,The Break with Michelle Wolf,Late-night,late-night,27-May-18,"1 season, 10 episodes",1,10,27 min.,27,27,Renewed,1,VarietyTalk,English
355,Norm Macdonald Has a Show,Talk show,"talk,show",14-Sep-18,"1 season, 10 episodes",1,10,26–35 min.,26,35,Pending,1,VarietyTalk,English
356,Patriot Act with Hasan Minhaj,Talk show,"talk,show",28-Oct-18,"3 volumes, 19 episodes",0,19,23–30 min.,23,30,Renewed,1,VarietyTalk,English


Notice that when the `DataFrame` is shown in Jupyter, is displays `358 rows × 14 columns` at the bottom to tell us how large the data set is.

### Using a Non-Numerical Index

We can actually use any of the unique-entry columns (for example `Title`, which is column `0`)

In [None]:
# reloading the dataset, now stating the first column as the index one
netflix = pd.read_csv('https://www.dropbox.com/s/pwqaqftq2m9pgdv/netflix.csv?raw=1', 
                      index_col=0)
netflix

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
House of Cards,Political drama,"political,drama",1-Feb-13,"6 seasons, 73 episodes",6,73,42–59 min.,42,59,Ended,0,Drama,English
Hemlock Grove,Horror/thriller,"horror,thriller",19-Apr-13,"3 seasons, 33 episodes",3,33,45–58 min.,45,58,Ended,0,Drama,English
Orange Is the New Black,Comedy-drama,comedy-drama,11-Jul-13,"6 seasons, 78 episodes",6,78,50–92 min.,50,92,Renewed,1,Drama,English
Marco Polo,Historical drama,"historical,drama",12-Dec-14,"2 seasons, 20 episodes",2,20,48–65 min.,48,65,Ended,0,Drama,English
Bloodline,Thriller,thriller,20-Mar-15,"3 seasons, 33 episodes",3,33,48–68 min.,48,68,Ended,0,Drama,English
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Busted!,Korean language variety show,"korean,language,variety,show",4-May-18,"1 season, 10 episodes",1,10,75–98 min.,75,98,Renewed,1,VarietyTalk,Korean
The Break with Michelle Wolf,Late-night,late-night,27-May-18,"1 season, 10 episodes",1,10,27 min.,27,27,Renewed,1,VarietyTalk,English
Norm Macdonald Has a Show,Talk show,"talk,show",14-Sep-18,"1 season, 10 episodes",1,10,26–35 min.,26,35,Pending,1,VarietyTalk,English
Patriot Act with Hasan Minhaj,Talk show,"talk,show",28-Oct-18,"3 volumes, 19 episodes",0,19,23–30 min.,23,30,Renewed,1,VarietyTalk,English


It's up to you to decide whether to use default numerical indexing or to use an uniquely-identifying column from your data set.

### Locating an item

In [None]:
# Locating all info of an entry
netflix.loc['Orange Is the New Black']

Genre                       Comedy-drama
GenreLabels                 comedy-drama
Premiere                       11-Jul-13
Seasons           6 seasons, 78 episodes
SeasonsParsed                          6
EpisodesParsed                        78
Length                        50–92 min.
MinLength                             50
MaxLength                             92
Status                           Renewed
Active                                 1
Table                              Drama
Language                         English
Name: Orange Is the New Black, dtype: object

In [None]:
# Locating particular info of that entry
netflix.at['Orange Is the New Black', 'Seasons']

'6 seasons, 78 episodes'

In [None]:
# This one allows you to access by number, even if the table has no numerical indexes
netflix.iloc[0]

Genre                    Political drama
GenreLabels              political,drama
Premiere                        1-Feb-13
Seasons           6 seasons, 73 episodes
SeasonsParsed                          6
EpisodesParsed                        73
Length                        42–59 min.
MinLength                             42
MaxLength                             59
Status                             Ended
Active                                 0
Table                              Drama
Language                         English
Name: House of Cards, dtype: object

### Getting a Subset of the Data

In [None]:
netflix.head(5)

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
House of Cards,Political drama,"political,drama",1-Feb-13,"6 seasons, 73 episodes",6,73,42–59 min.,42,59,Ended,0,Drama,English
Hemlock Grove,Horror/thriller,"horror,thriller",19-Apr-13,"3 seasons, 33 episodes",3,33,45–58 min.,45,58,Ended,0,Drama,English
Orange Is the New Black,Comedy-drama,comedy-drama,11-Jul-13,"6 seasons, 78 episodes",6,78,50–92 min.,50,92,Renewed,1,Drama,English
Marco Polo,Historical drama,"historical,drama",12-Dec-14,"2 seasons, 20 episodes",2,20,48–65 min.,48,65,Ended,0,Drama,English
Bloodline,Thriller,thriller,20-Mar-15,"3 seasons, 33 episodes",3,33,48–68 min.,48,68,Ended,0,Drama,English


In [None]:
netflix.tail(3)

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Norm Macdonald Has a Show,Talk show,"talk,show",14-Sep-18,"1 season, 10 episodes",1,10,26–35 min.,26,35,Pending,1,VarietyTalk,English
Patriot Act with Hasan Minhaj,Talk show,"talk,show",28-Oct-18,"3 volumes, 19 episodes",0,19,23–30 min.,23,30,Renewed,1,VarietyTalk,English
The Fix,Panel show,"panel,show",14-Dec-18,"1 season, 10 episodes",1,10,23–31 min.,23,31,Pending,1,VarietyTalk,English


In [None]:
netflix.sample(5)

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Terrorism Close Calls,Docu-series,docu-series,26-Oct-18,"1 season, 10 episodes",1,10,47–49 min.,47,49,Pending,1,DocuSeries,English
No Good Nick,Sitcom,sitcom,15-Apr-19,"1 part, 10 episodes",0,10,26–32 min.,26,32,Renewed,1,Children,English
Narcos,Crime drama,"crime,drama",28-Aug-15,"3 seasons, 30 episodes",3,30,43–60 min.,43,60,Ended,0,Drama,English
Dirty Money,Documentary,documentary,26-Jan-18,"1 season, 6 episodes",1,6,50–77 min.,50,77,Renewed,1,DocuSeries,English
"Love, Death & Robots",Anthology,anthology,15-Mar-19,"1 volume, 18 episodes",0,18,6–17 min.,6,17,Renewed,1,Animation,English


### Choosing Specific Rows

In [None]:
netflix.loc[['Diablero', 'Motown Magic', 'Typewriter']] # by index

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Diablero,Horror fantasy thriller,"horror,fantasy,thriller",21-Dec-18,"1 season, 8 episodes",1,8,36–44 min.,36,44,Renewed,1,ForeignLanguage,Spanish
Motown Magic,childrens-animation,childrens-animation,20-Nov-18,"1 season, 25 episodes",1,25,15–26 min.,15,26,Renewed,1,Children,English
Typewriter,Horror,horror,19-Jul-19,TBA,0,0,TBA,0,0,Pending,1,ForeignLanguage,Hindi


In [None]:
netflix.iloc[[40, 12, 106, 79]] # by numerical index

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Jessica Jones,Neo-noir/psychological thriller,"neo-noir,psychological,thriller",20-Nov-15,"3 seasons, 39 episodes",3,39,44–56 min.,44,56,Ended,0,Marvel,English
A Series of Unfortunate Events,Black-comedy mystery,"black-comedy,mystery",13-Jan-17,"3 seasons, 25 episodes",3,25,36–64 min.,36,64,Ended,0,Drama,English
Lost Song,Musical fantasy,"musical,fantasy",31-Mar-18,"1 season, 12 episodes",1,12,23–24 min.,23,24,Ended,0,Anime,English
After Life,Comedy,comedy,8-Mar-19,"1 season, 6 episodes",1,6,25–31 min.,25,31,Renewed,1,Comedy,English


In [None]:
netflix.loc['Maniac':'After Life'] # selecting a range by index

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Maniac,Dark comedy,"dark,comedy",21-Sep-18,10 episodes,0,10,27–47 min.,27,47,Miniseries,0,Comedy,English
The Kominsky Method,Comedy,comedy,16-Nov-18,"1 season, 8 episodes",1,8,23–34 min.,23,34,Renewed,1,Comedy,English
Sex Education,Coming-of-age comedy-drama,"coming-of-age,comedy-drama",11-Jan-19,"1 season, 8 episodes",1,8,47–53 min.,47,53,Renewed,1,Comedy,English
Russian Doll,Comedy,comedy,1-Feb-19,"1 season, 8 episodes",1,8,25–30 min.,25,30,Renewed,1,Comedy,English
After Life,Comedy,comedy,8-Mar-19,"1 season, 6 episodes",1,6,25–31 min.,25,31,Renewed,1,Comedy,English


In [None]:
netflix.iloc[75:79] # selecting a range by numerical index

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Maniac,Dark comedy,"dark,comedy",21-Sep-18,10 episodes,0,10,27–47 min.,27,47,Miniseries,0,Comedy,English
The Kominsky Method,Comedy,comedy,16-Nov-18,"1 season, 8 episodes",1,8,23–34 min.,23,34,Renewed,1,Comedy,English
Sex Education,Coming-of-age comedy-drama,"coming-of-age,comedy-drama",11-Jan-19,"1 season, 8 episodes",1,8,47–53 min.,47,53,Renewed,1,Comedy,English
Russian Doll,Comedy,comedy,1-Feb-19,"1 season, 8 episodes",1,8,25–30 min.,25,30,Renewed,1,Comedy,English


### Filtering Rows by Values in Columns

One of the most important features you need is to be able to filter observations out

We can define a **condition** so that it can be checked for all entries

In [None]:
our_condition = netflix['SeasonsParsed'] == 6
our_condition

Title
House of Cards                    True
Hemlock Grove                    False
Orange Is the New Black           True
Marco Polo                       False
Bloodline                        False
                                 ...  
Busted!                          False
The Break with Michelle Wolf     False
Norm Macdonald Has a Show        False
Patriot Act with Hasan Minhaj    False
The Fix                          False
Name: SeasonsParsed, Length: 358, dtype: bool

Then, we can ask for that condition to be the filter of the original dataset

In [None]:
netflix[our_condition]

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
House of Cards,Political drama,"political,drama",1-Feb-13,"6 seasons, 73 episodes",6,73,42–59 min.,42,59,Ended,0,Drama,English
Orange Is the New Black,Comedy-drama,comedy-drama,11-Jul-13,"6 seasons, 78 episodes",6,78,50–92 min.,50,92,Renewed,1,Drama,English
The Adventures of Puss in Boots,childrens-animation,childrens-animation,16-Jan-15,"6 seasons, 78 episodes",6,78,22–24 min.,22,24,Ended,0,Children,English
Dragons: Race to the Edge,childrens-animation,childrens-animation,26-Jun-15,"6 seasons, 78 episodes",6,78,22–23 min.,22,23,Ended,0,Children,English
Trolls: The Beat Goes On!,childrens-animation,childrens-animation,19-Jan-18,"6 seasons, 38 episodes",6,38,24–25 min.,24,25,Pending,1,Children,English


In [None]:
netflix[netflix['SeasonsParsed'] == 6] # the same, but in one line

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
House of Cards,Political drama,"political,drama",1-Feb-13,"6 seasons, 73 episodes",6,73,42–59 min.,42,59,Ended,0,Drama,English
Orange Is the New Black,Comedy-drama,comedy-drama,11-Jul-13,"6 seasons, 78 episodes",6,78,50–92 min.,50,92,Renewed,1,Drama,English
The Adventures of Puss in Boots,childrens-animation,childrens-animation,16-Jan-15,"6 seasons, 78 episodes",6,78,22–24 min.,22,24,Ended,0,Children,English
Dragons: Race to the Edge,childrens-animation,childrens-animation,26-Jun-15,"6 seasons, 78 episodes",6,78,22–23 min.,22,23,Ended,0,Children,English
Trolls: The Beat Goes On!,childrens-animation,childrens-animation,19-Jan-18,"6 seasons, 38 episodes",6,38,24–25 min.,24,25,Pending,1,Children,English


In [None]:
netflix[netflix['MaxLength'] > 100] # series longer than 100 minutes

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Sense8,Science fiction,science-fiction,5-Jun-15,"2 seasons, 24 episodes",2,24,45–152 min.,45,152,Ended,0,Drama,English
Gilmore Girls: A Year in the Life,Family drama,"family,drama",25-Nov-16,4 episodes,0,4,88–102 min.,88,102,Miniseries,0,Drama,English


In [None]:
# series with a number of episodes between 30 and 34
netflix[(netflix['EpisodesParsed'] >= 30) & (netflix['EpisodesParsed'] < 35)]

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Hemlock Grove,Horror/thriller,"horror,thriller",19-Apr-13,"3 seasons, 33 episodes",3,33,45–58 min.,45,58,Ended,0,Drama,English
Bloodline,Thriller,thriller,20-Mar-15,"3 seasons, 33 episodes",3,33,48–68 min.,48,68,Ended,0,Drama,English
Narcos,Crime drama,"crime,drama",28-Aug-15,"3 seasons, 30 episodes",3,30,43–60 min.,43,60,Ended,0,Drama,English
Love,Romantic comedy,"romantic,comedy",19-Feb-16,"3 seasons, 34 episodes",3,34,24–40 min.,24,40,Ended,0,Comedy,English
Santa Clarita Diet,Comedy-horror,comedy-horror,3-Feb-17,"3 seasons, 30 episodes",3,30,26–35 min.,26,35,Ended,0,Comedy,English
Go! Live Your Way,Musical,musical,22-Feb-19,"2 seasons, 30 episodes",2,30,36–44 min.,36,44,Pending,1,ForeignLanguage,Spanish
Chef's Table,Culinary art,"culinary,art",26-Apr-15,"6 volumes, 30 episodes",0,30,42–58 min.,42,58,Renewed,1,DocuSeries,English


In [None]:
# with 33 or 73 episodes
netflix[(netflix['EpisodesParsed'] == 33) | (netflix['EpisodesParsed'] == 73)]

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
House of Cards,Political drama,"political,drama",1-Feb-13,"6 seasons, 73 episodes",6,73,42–59 min.,42,59,Ended,0,Drama,English
Hemlock Grove,Horror/thriller,"horror,thriller",19-Apr-13,"3 seasons, 33 episodes",3,33,45–58 min.,45,58,Ended,0,Drama,English
Bloodline,Thriller,thriller,20-Mar-15,"3 seasons, 33 episodes",3,33,48–68 min.,48,68,Ended,0,Drama,English


###  Sorting Rows

Let's create a new frame called `variable_length_shows` that contains series where longest episode is 45 minutes longer than the shortest 

In [None]:
variable_length_shows = netflix[netflix['MaxLength'] > netflix['MinLength'] + 45]
variable_length_shows

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Sense8,Science fiction,science-fiction,5-Jun-15,"2 seasons, 24 episodes",2,24,45–152 min.,45,152,Ended,0,Drama,English
Kong: King of the Apes,childrens-animation,childrens-animation,15-Apr-16,"2 seasons, 23 episodes",2,23,22–85 min.,22,85,Ended,0,Children,English
Club de Cuervos,Comedy-drama,comedy-drama,7-Aug-15,"4 seasons, 45 episodes",4,45,36–94 min.,36,94,Ended,0,ForeignLanguage,Spanish


We can use the `sort_values` method to sort on a chosen variable.

In [None]:
variable_length_shows.sort_values('MaxLength')

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Kong: King of the Apes,childrens-animation,childrens-animation,15-Apr-16,"2 seasons, 23 episodes",2,23,22–85 min.,22,85,Ended,0,Children,English
Club de Cuervos,Comedy-drama,comedy-drama,7-Aug-15,"4 seasons, 45 episodes",4,45,36–94 min.,36,94,Ended,0,ForeignLanguage,Spanish
Sense8,Science fiction,science-fiction,5-Jun-15,"2 seasons, 24 episodes",2,24,45–152 min.,45,152,Ended,0,Drama,English


We can also sort in reverse (descending) order using `ascending=False`.

In [None]:
variable_length_shows.sort_values('MaxLength', ascending=False)

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Sense8,Science fiction,science-fiction,5-Jun-15,"2 seasons, 24 episodes",2,24,45–152 min.,45,152,Ended,0,Drama,English
Club de Cuervos,Comedy-drama,comedy-drama,7-Aug-15,"4 seasons, 45 episodes",4,45,36–94 min.,36,94,Ended,0,ForeignLanguage,Spanish
Kong: King of the Apes,childrens-animation,childrens-animation,15-Apr-16,"2 seasons, 23 episodes",2,23,22–85 min.,22,85,Ended,0,Children,English


Note that `sort_values` does **not** actually modify the contents of the data frame

If we display it, will see it is not sorted!

In [None]:
variable_length_shows

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Sense8,Science fiction,science-fiction,5-Jun-15,"2 seasons, 24 episodes",2,24,45–152 min.,45,152,Ended,0,Drama,English
Kong: King of the Apes,childrens-animation,childrens-animation,15-Apr-16,"2 seasons, 23 episodes",2,23,22–85 min.,22,85,Ended,0,Children,English
Club de Cuervos,Comedy-drama,comedy-drama,7-Aug-15,"4 seasons, 45 episodes",4,45,36–94 min.,36,94,Ended,0,ForeignLanguage,Spanish


Creating a new dataset only with certain columns

In [None]:
shorter_netflix = netflix[['Genre', 'SeasonsParsed']]
shorter_netflix

Unnamed: 0_level_0,Genre,SeasonsParsed
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
House of Cards,Political drama,6
Hemlock Grove,Horror/thriller,3
Orange Is the New Black,Comedy-drama,6
Marco Polo,Historical drama,2
Bloodline,Thriller,3
...,...,...
Busted!,Korean language variety show,1
The Break with Michelle Wolf,Late-night,1
Norm Macdonald Has a Show,Talk show,1
Patriot Act with Hasan Minhaj,Talk show,0


One final useful trick, we can add new columns to a data frame

In [None]:
# creating a new column named 'watched' filled with no
netflix['Watched'] = ['No']*len(netflix)
netflix

Unnamed: 0_level_0,Genre,GenreLabels,Premiere,Seasons,SeasonsParsed,EpisodesParsed,Length,MinLength,MaxLength,Status,Active,Table,Language,Watched
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
House of Cards,Political drama,"political,drama",1-Feb-13,"6 seasons, 73 episodes",6,73,42–59 min.,42,59,Ended,0,Drama,English,No
Hemlock Grove,Horror/thriller,"horror,thriller",19-Apr-13,"3 seasons, 33 episodes",3,33,45–58 min.,45,58,Ended,0,Drama,English,No
Orange Is the New Black,Comedy-drama,comedy-drama,11-Jul-13,"6 seasons, 78 episodes",6,78,50–92 min.,50,92,Renewed,1,Drama,English,No
Marco Polo,Historical drama,"historical,drama",12-Dec-14,"2 seasons, 20 episodes",2,20,48–65 min.,48,65,Ended,0,Drama,English,No
Bloodline,Thriller,thriller,20-Mar-15,"3 seasons, 33 episodes",3,33,48–68 min.,48,68,Ended,0,Drama,English,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Busted!,Korean language variety show,"korean,language,variety,show",4-May-18,"1 season, 10 episodes",1,10,75–98 min.,75,98,Renewed,1,VarietyTalk,Korean,No
The Break with Michelle Wolf,Late-night,late-night,27-May-18,"1 season, 10 episodes",1,10,27 min.,27,27,Renewed,1,VarietyTalk,English,No
Norm Macdonald Has a Show,Talk show,"talk,show",14-Sep-18,"1 season, 10 episodes",1,10,26–35 min.,26,35,Pending,1,VarietyTalk,English,No
Patriot Act with Hasan Minhaj,Talk show,"talk,show",28-Oct-18,"3 volumes, 19 episodes",0,19,23–30 min.,23,30,Renewed,1,VarietyTalk,English,No


## Lab Activity