# Indexes 

## What are indexes or labels?
The typically a unique identifier for a row. By default we use numbers from 0 to n, so row 0, 1, 2, etc. However you can customize the index and set it to a column value. For example, you can set the index of a user data frame as the `email` column, since emails are typically unique. Now rather than rows relying on the zero-index numbering system, instead they're index based on their email values.

In [None]:
'''
+ Ex. 1: Setting and manipulating indexes manually.
'''
import pandas as pd
users = {
  'first': ["Kevin", "Abby"],
  'last': ["Nguyen", "Wendel"],
  'email': ["knguyen44@gmail.com", "abbyWendel@outlook.com"]
}
usersDF = pd.DataFrame(users)

# Update data frame so that it uses 'email' for the indexing
usersDF = usersDF.set_index("email") 

# then we want to reset the index
usersDF = usersDF.reset_index()

In [None]:
'''
+ Ex. 2: Setting the index whilst loading it in as a CSV. So here we set the index (unique identifier for rows) as the 
'Respondent' column, which is just the ID of the respondent. Then for the schema, we know that the 'column' value is unique
and it's more useful now, as we can quickly look up with a certain column means based on the name.
'''
import pandas as pd
csvPath = "../data/survey_results_public.csv"
df = pd.read_csv(csvPath, index_col="Respondent")

schemaCsvPath = "../data/survey_results_schema.csv"
dataSchemaDF = pd.read_csv(schemaCsvPath, index_col="Column")

# Sorting the rows in our data frame such that the indexes are in alphabetical order.
dataSchemaDF = dataSchemaDF.sort_index(ascending=False)


pd.set_option("display.max_columns", 10)
pd.set_option("display.max_rows", 10)

# We can use '.loc' to search based on the index value!
print(dataSchemaDF.loc["MgrIdiot"])


print(df.head)

