# Pandas Tutorial
### Indexes - How to Set, Reset, and Use Indexes
- **Source:** Corey Schafer - [Python Pandas Playlist](https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS)

- Data used from [StackOverflow Developer Survey 2021](https://insights.stackoverflow.com/survey)

In [11]:
import pandas as pd

### Creating a sample DataFrame

In [12]:
people = {
    "first" : ["Parampreet", "Corey", "Anant"],
    "last" : ["Singh", "Schafer", "Luthra"],
    "email" : ["ParampreetSingh@gmail.com", "CoreySchafer@gmail.com", "AnantLuthra@gmail.com"]
}

In [13]:
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,Parampreet,Singh,ParampreetSingh@gmail.com
1,Corey,Schafer,CoreySchafer@gmail.com
2,Anant,Luthra,AnantLuthra@gmail.com


In [14]:
df["email"]

0    ParampreetSingh@gmail.com
1       CoreySchafer@gmail.com
2        AnantLuthra@gmail.com
Name: email, dtype: object

### To set custom index from the existing columns, use `.set_index()`
- Also, set `inplace=True` to make changes in the working dataframe.

In [15]:
# Sets Index of the DataFrame from it's exisiting columns.
# Returns modified dataframe, if inplace=False (default)
df.set_index("email", inplace=True)
df

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
ParampreetSingh@gmail.com,Parampreet,Singh
CoreySchafer@gmail.com,Corey,Schafer
AnantLuthra@gmail.com,Anant,Luthra


In [16]:
# Gets the index of the DataFrame
df.index

Index(['ParampreetSingh@gmail.com', 'CoreySchafer@gmail.com',
       'AnantLuthra@gmail.com'],
      dtype='object', name='email')

In [17]:
df.loc["CoreySchafer@gmail.com":]

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
CoreySchafer@gmail.com,Corey,Schafer
AnantLuthra@gmail.com,Anant,Luthra


In [18]:
df.iloc[0]

first    Parampreet
last          Singh
Name: ParampreetSingh@gmail.com, dtype: object

- To reset the index to default, use `.reset_index()`
- Also `inplace=True` to make changes in the working DataFrame

In [19]:
# Reseting the index to default
df.reset_index(inplace=True)
df

Unnamed: 0,email,first,last
0,ParampreetSingh@gmail.com,Parampreet,Singh
1,CoreySchafer@gmail.com,Corey,Schafer
2,AnantLuthra@gmail.com,Anant,Luthra


In [20]:
df = pd.read_csv("./data/survey_results_public.csv", index_col="ResponseId")
schema_df = pd.read_csv("./data/survey_results_schema.csv", index_col="qname")

In [21]:
pd.set_option("display.max_columns", df.shape[1])

In [22]:
df.head()

Unnamed: 0_level_0,MainBranch,Employment,Country,US_State,UK_Country,EdLevel,Age1stCode,LearnCode,YearsCode,YearsCodePro,DevType,OrgSize,Currency,CompTotal,CompFreq,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSys,NEWStuck,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,NEWOtherComms,Age,Gender,Trans,Sexuality,Ethnicity,Accessibility,MentalHealth,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1
1,I am a developer by profession,"Independent contractor, freelancer, or self-em...",Slovakia,,,"Secondary school (e.g. American high school, G...",18 - 24 years,Coding Bootcamp;Other online resources (ex: vi...,,,"Developer, mobile",20 to 99 employees,EUR European Euro,4800.0,Monthly,C++;HTML/CSS;JavaScript;Objective-C;PHP;Swift,Swift,PostgreSQL;SQLite,SQLite,,,Laravel;Symfony,,,,,,PHPStorm;Xcode,Atom;Xcode,MacOS,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Multiple times per day,Yes,A few times per month or weekly,"Yes, definitely",No,25-34 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,62268.0
2,I am a student who is learning to code,"Student, full-time",Netherlands,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",7.0,,,,,,,JavaScript;Python,,PostgreSQL,,,,Angular;Flask;Vue.js,,Cordova,,Docker;Git;Yarn,Git,Android Studio;IntelliJ;Notepad++;PyCharm,,Windows,Visit Stack Overflow;Google it,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,"Yes, definitely",No,18-24 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,
3,"I am not primarily a developer, but I write co...","Student, full-time",Russian Federation,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",,,,,,,,Assembly;C;Python;R;Rust,Julia;Python;Rust,SQLite,SQLite,Heroku,,Flask,Flask,NumPy;Pandas;TensorFlow;Torch/PyTorch,Keras;NumPy;Pandas;TensorFlow;Torch/PyTorch,,,IPython/Jupyter;PyCharm;RStudio;Sublime Text;V...,IPython/Jupyter;RStudio;Sublime Text;Visual St...,MacOS,Visit Stack Overflow;Google it;Watch help / tu...,Stack Overflow;Stack Exchange,Multiple times per day,Yes,Multiple times per day,"Yes, definitely",Yes,18-24 years old,Man,No,Prefer not to say,Prefer not to say,None of the above,None of the above,Appropriate in length,Easy,
4,I am a developer by profession,Employed full-time,Austria,,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",11 - 17 years,,,,"Developer, front-end",100 to 499 employees,EUR European Euro,,Monthly,JavaScript;TypeScript,JavaScript;TypeScript,,,,,Angular;jQuery,Angular;jQuery,,,,,,,Windows,Call a coworker or friend;Visit Stack Overflow...,Stack Overflow,Daily or almost daily,Yes,Daily or almost daily,Neutral,No,35-44 years old,Man,No,Straight / Heterosexual,White or of European descent,I am deaf / hard of hearing,,Appropriate in length,Neither easy nor difficult,
5,I am a developer by profession,"Independent contractor, freelancer, or self-em...",United Kingdom of Great Britain and Northern I...,,England,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",5 - 10 years,Friend or family member,17.0,10.0,"Developer, desktop or enterprise applications;...","Just me - I am a freelancer, sole proprietor, ...",GBP\tPound sterling,,,Bash/Shell;HTML/CSS;Python;SQL,Bash/Shell;HTML/CSS;Python;SQL,Elasticsearch;PostgreSQL;Redis,Cassandra;Elasticsearch;PostgreSQL;Redis,,,Flask,Flask,Apache Spark;Hadoop;NumPy;Pandas,Hadoop;NumPy;Pandas,Docker;Git;Kubernetes;Yarn,Docker;Git;Kubernetes;Yarn,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim,Atom;IPython/Jupyter;Notepad++;PyCharm;Vim;Vis...,Linux-based,Visit Stack Overflow;Go for a walk or other ph...,Stack Overflow;Stack Exchange,Daily or almost daily,Yes,A few times per week,"Yes, somewhat",No,25-34 years old,Man,No,,White or of European descent,None of the above,,Appropriate in length,Easy,


In [23]:
schema_df.head()

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S0,QID16,"<div><span style=""font-size:19px;""><strong>Hel...",False,DB,TB
MetaInfo,QID12,Browser Meta Info,False,Meta,Browser
S1,QID1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
MainBranch,QID2,Which of the following options best describes ...,True,MC,SAVR
Employment,QID24,Which of the following best describes your cur...,False,MC,MAVR


In [24]:
schema_df.loc["Gender", "question"]

'Which of the following describe you, if any? Please check all that apply.'

In [25]:
df.columns

Index(['MainBranch', 'Employment', 'Country', 'US_State', 'UK_Country',
       'EdLevel', 'Age1stCode', 'LearnCode', 'YearsCode', 'YearsCodePro',
       'DevType', 'OrgSize', 'Currency', 'CompTotal', 'CompFreq',
       'LanguageHaveWorkedWith', 'LanguageWantToWorkWith',
       'DatabaseHaveWorkedWith', 'DatabaseWantToWorkWith',
       'PlatformHaveWorkedWith', 'PlatformWantToWorkWith',
       'WebframeHaveWorkedWith', 'WebframeWantToWorkWith',
       'MiscTechHaveWorkedWith', 'MiscTechWantToWorkWith',
       'ToolsTechHaveWorkedWith', 'ToolsTechWantToWorkWith',
       'NEWCollabToolsHaveWorkedWith', 'NEWCollabToolsWantToWorkWith', 'OpSys',
       'NEWStuck', 'NEWSOSites', 'SOVisitFreq', 'SOAccount', 'SOPartFreq',
       'SOComm', 'NEWOtherComms', 'Age', 'Gender', 'Trans', 'Sexuality',
       'Ethnicity', 'Accessibility', 'MentalHealth', 'SurveyLength',
       'SurveyEase', 'ConvertedCompYearly'],
      dtype='object')

### To sort the DataFrame based on Index, we use `sort_index()`
- We can set `ascending=False`, if we want data to be sorted in descending order of index
- We can also set `inplace=True` so that our working DataFrame changes.

In [26]:
schema_df.sort_index(inplace=True)

In [27]:
schema_df.index

Index(['Accessibility', 'Age', 'Age1stCode', 'CompFreq', 'CompTotal',
       'Country', 'Currency', 'Database', 'DevType', 'EdLevel', 'Employment',
       'Ethnicity', 'Gender', 'Language', 'LearnCode', 'MainBranch',
       'MentalHealth', 'MetaInfo', 'MiscTech', 'NEWCollabTools',
       'NEWOtherComms', 'NEWOtherCommsNames', 'NEWSOSites', 'NEWStuck',
       'OpSys', 'OrgSize', 'Platform', 'S0', 'S1', 'S2', 'S3', 'S4', 'S5',
       'S6', 'SOAccount', 'SOComm', 'SOPartFreq', 'SOVisitFreq', 'Sexuality',
       'SurveyEase', 'SurveyLength', 'ToolsTech', 'Trans', 'UK_Country',
       'US_State', 'Webframe', 'YearsCode', 'YearsCodePro'],
      dtype='object', name='qname')