# Notes for YouTube Python Tutorials
## Python Pandas Tutorial
https://m.youtube.com/watch?v=ZyhVh-qRZPA&list=PL-osiE80TeTsWmV9i9c58mdDCSsklFdDS&index=2&t=247s

## Python Pandas Tutorial (Part 6): Add/Remove Rows and Columns From DataFrames

Sample data was downloaded from the following link.<br>
https://insights.stackoverflow.com/survey

In [1]:
import pandas as pd


# Small sample Example
people = {
    "first": ["Corey", "Jane", "John"],
    "last": ["Schafer", "Doe", "Doe"],
    "email": ["CoreyMSchafer@gmail.com", "JaneDoe@email.com", "JohnDoe@email.com"]
}

people_df = pd.DataFrame(people, columns=['first', 'last', 'email'])
people_df

Unnamed: 0,first,last,email
0,Corey,Schafer,CoreyMSchafer@gmail.com
1,Jane,Doe,JaneDoe@email.com
2,John,Doe,JohnDoe@email.com


In [2]:
# Real world example
df = pd.read_csv('Python Data/developer_survey_2019/survey_results_public.csv',
                     index_col='Respondent')
schema_df = pd.read_csv('Python Data/developer_survey_2019/survey_results_schema.csv',
                     index_col='Column')

### Small Sample Example

In [3]:
# Add a new column
people_df['full_name'] = people_df['first'] + ' ' + people_df['last']
people_df

Unnamed: 0,first,last,email,full_name
0,Corey,Schafer,CoreyMSchafer@gmail.com,Corey Schafer
1,Jane,Doe,JaneDoe@email.com,Jane Doe
2,John,Doe,JohnDoe@email.com,John Doe


In [4]:
# Remove columns
people_df.drop(columns=['first', 'last'], inplace=True)
people_df

Unnamed: 0,email,full_name
0,CoreyMSchafer@gmail.com,Corey Schafer
1,JaneDoe@email.com,Jane Doe
2,JohnDoe@email.com,John Doe


In [5]:
# Separate a column to two
people_df['full_name'].str.split(' ')

0    [Corey, Schafer]
1         [Jane, Doe]
2         [John, Doe]
Name: full_name, dtype: object

In [6]:
# expand will change data to dataframe
people_df[['first', 'last']] = people_df['full_name'].str.split(' ', expand=True)
people_df

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe


In [7]:
# Add a single row of data
# ignore_index will filled all empty with NaN. Otherwise, an error will be popped out.
people_df.append({'first': 'Tony'}, ignore_index=True)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe
3,,,Tony,


In [8]:
people2 = {
    'first': ['Tony', 'Steve'],
    'last': ['Stark', 'Rogers'],
    'email': ['IronMan@avenge.com', 'Cap@avenge.com']
}

# Note: The column order of people_df2 is not the same as people_df's.
people_df2 = pd.DataFrame(people2)
people_df2

Unnamed: 0,first,last,email
0,Tony,Stark,IronMan@avenge.com
1,Steve,Rogers,Cap@avenge.com


In [9]:
# Because people_df and people_df2 has differnt column orders,
# an warning of "sort=False" may pop out.
# Adding "sort=False" will remove the warning.
people_df = people_df.append(people_df2, ignore_index=True, sort=False)
people_df

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe
3,IronMan@avenge.com,,Tony,Stark
4,Cap@avenge.com,,Steve,Rogers


In [10]:
# Remove a single row
people_df.drop(index=4)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe
3,IronMan@avenge.com,,Tony,Stark


In [11]:
# Remove rows by condidtion
filt = people_df['last'] == 'Doe'
people_df.drop(index=people_df[filt].index)

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
3,IronMan@avenge.com,,Tony,Stark
4,Cap@avenge.com,,Steve,Rogers


In [12]:
filt = pd.isna(people_df['full_name'])
people_df = people_df.drop(people_df[filt].index)
people_df

Unnamed: 0,email,full_name,first,last
0,CoreyMSchafer@gmail.com,Corey Schafer,Corey,Schafer
1,JaneDoe@email.com,Jane Doe,Jane,Doe
2,JohnDoe@email.com,John Doe,John,Doe


## Python Pandas Tutorial (Part 7): Sorting Data

### Small Sample Example

In [13]:
# Modify sample data for further learning.
people_df.drop(columns='full_name', inplace=True)
people_df = people_df.append({'first': 'Adam', 'last': 'Doe', 'email': 'A@email.com'}, ignore_index=True)
people_df

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe
3,A@email.com,Adam,Doe


In [14]:
people_df.sort_values(by='last', ascending=False)

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe
3,A@email.com,Adam,Doe


In [15]:
# Both last and first in descending order.
people_df.sort_values(by=['last', 'first'], ascending=False)

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
2,JohnDoe@email.com,John,Doe
1,JaneDoe@email.com,Jane,Doe
3,A@email.com,Adam,Doe


In [16]:
# Last name in descending order and first be in ascending order.
people_df.sort_values(by=['last', 'first'], ascending=[False, True], inplace=True)
people_df

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
3,A@email.com,Adam,Doe
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe


In [17]:
# Back to original
people_df.sort_index()

Unnamed: 0,email,first,last
0,CoreyMSchafer@gmail.com,Corey,Schafer
1,JaneDoe@email.com,Jane,Doe
2,JohnDoe@email.com,John,Doe
3,A@email.com,Adam,Doe


In [18]:
# Only sort a series.
people_df['last'].sort_values()

3        Doe
1        Doe
2        Doe
0    Schafer
Name: last, dtype: object

### Real World Example

In [19]:
df.sort_values(by='Country', inplace=True)
df['Country'].head(10)

Respondent
39258    Afghanistan
63129    Afghanistan
85715    Afghanistan
50767    Afghanistan
2782     Afghanistan
63019    Afghanistan
6417     Afghanistan
40000    Afghanistan
88731    Afghanistan
48436    Afghanistan
Name: Country, dtype: object

In [20]:
df[['Country', 'ConvertedComp']].head(10)

Unnamed: 0_level_0,Country,ConvertedComp
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1
39258,Afghanistan,19152.0
63129,Afghanistan,1000000.0
85715,Afghanistan,
50767,Afghanistan,
2782,Afghanistan,
63019,Afghanistan,
6417,Afghanistan,
40000,Afghanistan,
88731,Afghanistan,
48436,Afghanistan,4464.0


In [21]:
df.sort_values(by=['Country', 'ConvertedComp'], ascending=[True, False], inplace=True)
df[['Country', 'ConvertedComp']].head(10)

Unnamed: 0_level_0,Country,ConvertedComp
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1
63129,Afghanistan,1000000.0
50499,Afghanistan,153216.0
39258,Afghanistan,19152.0
58450,Afghanistan,17556.0
7085,Afghanistan,14364.0
22450,Afghanistan,7980.0
48436,Afghanistan,4464.0
10746,Afghanistan,3996.0
8149,Afghanistan,1596.0
29736,Afghanistan,1116.0


In [22]:
# Getting the largestest values
df['ConvertedComp'].nlargest(10)

Respondent
25983    2000000.0
87896    2000000.0
22013    2000000.0
28243    2000000.0
72732    2000000.0
78151    2000000.0
80200    2000000.0
52132    2000000.0
75561    2000000.0
32250    2000000.0
Name: ConvertedComp, dtype: float64

In [23]:
# Not only ConvertedComp from dataframe
df.nlargest(10, 'ConvertedComp')
# df.nsmallest(10, 'ConvertedComp')

Unnamed: 0_level_0,MainBranch,Hobbyist,OpenSourcer,OpenSource,Employment,Country,Student,EdLevel,UndergradMajor,EduOther,...,WelcomeChange,SONewContent,Age,Gender,Trans,Sexuality,Ethnicity,Dependents,SurveyLength,SurveyEase
Respondent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
25983,I am a developer by profession,Yes,Less than once per year,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Canada,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Received on-the-job training in software devel...,...,Just as welcome now as I felt last year,,24.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy
87896,I am a developer by profession,Yes,Less than once per year,The quality of OSS and closed source software ...,Employed full-time,Germany,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Tech...,32.0,Man,No,Gay or Lesbian,White or of European descent,No,Appropriate in length,Neither easy nor difficult
22013,I am a developer by profession,Yes,Never,The quality of OSS and closed source software ...,Employed full-time,India,No,"Professional degree (JD, MD, etc.)","A natural science (ex. biology, chemistry, phy...",Taken an online course in programming or softw...,...,A lot more welcome now than last year,Tech articles written by other developers;Indu...,,Man,No,Straight / Heterosexual,,Yes,Too long,Easy
28243,I am a developer by profession,Yes,Once a month or more often,"OSS is, on average, of HIGHER quality than pro...","Independent contractor, freelancer, or self-em...",India,No,"Master’s degree (MA, MS, M.Eng., MBA, etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,A lot less welcome now than last year,Tech meetups or events in your area,,,,Straight / Heterosexual,,Yes,Too short,Easy
72732,"I am not primarily a developer, but I write co...",No,Less than once a month but more than once per ...,"OSS is, on average, of LOWER quality than prop...",,India,"Yes, full-time","Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Contributed to open source software,...,A lot less welcome now than last year,Tech articles written by other developers;Tech...,,Man,No,,,Yes,Too long,Easy
78151,I am a developer by profession,Yes,Never,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Mexico,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech meetups or events in your area;Courses on...,32.0,Man,No,Straight / Heterosexual,Hispanic or Latino/Latina,No,Appropriate in length,Easy
80200,I am a developer by profession,Yes,Never,"OSS is, on average, of LOWER quality than prop...",Employed full-time,Netherlands,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Received on-the-job training in software devel...,...,,Tech articles written by other developers,25.0,Woman,No,Bisexual,White or of European descent,No,Appropriate in length,Easy
52132,I am a developer by profession,Yes,Less than once a month but more than once per ...,"OSS is, on average, of HIGHER quality than pro...",Employed full-time,Peru,No,Some college/university study without earning ...,I never declared a major,Completed an industry certification program (e...,...,Just as welcome now as I felt last year,Tech articles written by other developers;Tech...,48.0,Man,,,Black or of African descent;East Asian;Hispani...,Yes,Appropriate in length,Easy
75561,I am a developer by profession,Yes,Less than once a month but more than once per ...,The quality of OSS and closed source software ...,Employed full-time,Singapore,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","A humanities discipline (ex. literature, histo...",Taken an online course in programming or softw...,...,Just as welcome now as I felt last year,Tech meetups or events in your area,37.0,Man,No,Straight / Heterosexual,White or of European descent,Yes,Appropriate in length,Easy
32250,I am a developer by profession,Yes,Once a month or more often,The quality of OSS and closed source software ...,Employed full-time,Switzerland,No,"Bachelor’s degree (BA, BS, B.Eng., etc.)","Computer science, computer engineering, or sof...",Taken a part-time in-person course in programm...,...,Just as welcome now as I felt last year,Industry news about technologies you're intere...,30.0,Man,No,Straight / Heterosexual,White or of European descent,No,Appropriate in length,Easy


## Python Pandas Tutorial (Part 8): Grouping and Aggregating - Analyzing and Exploring Your Data