## First Steps with Pandas

### **Pandas DataFrame**

- A **DataFrame** is a **two-dimensional table** (like a spreadsheet or SQL table).
- It consists of **rows and columns**, where each column can have a different data type.

---

### **Key Characteristics**

- **Two-dimensional**: Contains rows and columns.
- **Indexed**: Rows and columns have labels (or indexes).
- **Heterogeneous**: Different columns can hold different data types.


### Creating your very first Pandas DataFrame (from csv)

In [1]:
import pandas as pd

In [3]:
pd.read_csv("titanic.csv")

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.2500,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.9250,S,
3,1,1,female,35.0,1,0,53.1000,S,C
4,0,3,male,35.0,0,0,8.0500,S,
...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,
887,1,1,female,19.0,0,0,30.0000,S,B
888,0,3,female,,1,2,23.4500,S,
889,1,1,male,26.0,0,0,30.0000,C,C


In [2]:
titanic = pd.read_csv("titanic.csv")

In [5]:
titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.2500,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.9250,S,
3,1,1,female,35.0,1,0,53.1000,S,C
4,0,3,male,35.0,0,0,8.0500,S,
...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,
887,1,1,female,19.0,0,0,30.0000,S,B
888,0,3,female,,1,2,23.4500,S,
889,1,1,male,26.0,0,0,30.0000,C,C


### Pandas Display Options and the methods head() & tail()

### **head()**
- Displays the **first few rows** of a DataFrame or Series.
- By default, it shows the **first 5 rows**, but you can specify a different number of rows.

---

### **tail()**
- Displays the **last few rows** of a DataFrame or Series.
- By default, it shows the **last 5 rows**, but you can specify a different number of rows.


In [3]:
titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.2500,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.9250,S,
3,1,1,female,35.0,1,0,53.1000,S,C
4,0,3,male,35.0,0,0,8.0500,S,
...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,
887,1,1,female,19.0,0,0,30.0000,S,B
888,0,3,female,,1,2,23.4500,S,
889,1,1,male,26.0,0,0,30.0000,C,C


In [7]:
print(titanic)

     survived  pclass     sex   age  sibsp  parch     fare embarked deck
0           0       3    male  22.0      1      0   7.2500        S  NaN
1           1       1  female  38.0      1      0  71.2833        C    C
2           1       3  female  26.0      0      0   7.9250        S  NaN
3           1       1  female  35.0      1      0  53.1000        S    C
4           0       3    male  35.0      0      0   8.0500        S  NaN
..        ...     ...     ...   ...    ...    ...      ...      ...  ...
886         0       2    male  27.0      0      0  13.0000        S  NaN
887         1       1  female  19.0      0      0  30.0000        S    B
888         0       3  female   NaN      1      2  23.4500        S  NaN
889         1       1    male  26.0      0      0  30.0000        C    C
890         0       3    male  32.0      0      0   7.7500        Q  NaN

[891 rows x 9 columns]


In [5]:
# maximum number of rows displayed when printing or viewing a DataFrame.
pd.options.display.max_rows=800
pd.options.display.max_rows

800

In [9]:
pd.options.display.min_rows

10

In [16]:
titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.2500,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.9250,S,
3,1,1,female,35.0,1,0,53.1000,S,C
4,0,3,male,35.0,0,0,8.0500,S,
...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,
887,1,1,female,19.0,0,0,30.0000,S,B
888,0,3,female,,1,2,23.4500,S,
889,1,1,male,26.0,0,0,30.0000,C,C


In [6]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.25,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.925,S,
3,1,1,female,35.0,1,0,53.1,S,C
4,0,3,male,35.0,0,0,8.05,S,


In [17]:
titanic.head(800)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.25,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.925,S,
3,1,1,female,35.0,1,0,53.1,S,C
4,0,3,male,35.0,0,0,8.05,S,
5,0,3,male,,0,0,8.4583,Q,
6,0,1,male,54.0,0,0,51.8625,S,E
7,0,3,male,2.0,3,1,21.075,S,
8,1,3,female,27.0,0,2,11.1333,S,
9,1,2,female,14.0,1,0,30.0708,C,


In [13]:
titanic.tail()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
886,0,2,male,27.0,0,0,13.0,S,
887,1,1,female,19.0,0,0,30.0,S,B
888,0,3,female,,1,2,23.45,S,
889,1,1,male,26.0,0,0,30.0,C,C
890,0,3,male,32.0,0,0,7.75,Q,


In [14]:
titanic.tail(2)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
889,1,1,male,26.0,0,0,30.0,C,C
890,0,3,male,32.0,0,0,7.75,Q,


### First Data Inspection

In [7]:
titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.2500,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.9250,S,
3,1,1,female,35.0,1,0,53.1000,S,C
4,0,3,male,35.0,0,0,8.0500,S,
...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,
887,1,1,female,19.0,0,0,30.0000,S,B
888,0,3,female,,1,2,23.4500,S,
889,1,1,male,26.0,0,0,30.0000,C,C


### titanic.info() Summary of the DataFrame

- **Displays a summary** of the DataFrame.
- **Key Information Includes**:
  - **Number of rows and columns**.
  - **Column names**.
  - **Non-null values** in each column.
  - **Data types** of each column.
  - **Memory usage**.


In [8]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   survived  891 non-null    int64  
 1   pclass    891 non-null    int64  
 2   sex       891 non-null    object 
 3   age       714 non-null    float64
 4   sibsp     891 non-null    int64  
 5   parch     891 non-null    int64  
 6   fare      891 non-null    float64
 7   embarked  889 non-null    object 
 8   deck      203 non-null    object 
dtypes: float64(2), int64(4), object(3)
memory usage: 62.8+ KB


In [9]:
titanic.describe()  # generate summary statistics for a DataFrame

Unnamed: 0,survived,pclass,age,sibsp,parch,fare
count,891.0,891.0,714.0,891.0,891.0,891.0
mean,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,0.0,1.0,0.42,0.0,0.0,0.0
25%,0.0,2.0,20.125,0.0,0.0,7.9104
50%,0.0,3.0,28.0,0.0,0.0,14.4542
75%,1.0,3.0,38.0,1.0,0.0,31.0
max,1.0,3.0,80.0,8.0,6.0,512.3292


In [12]:
titanic.describe(include="object")  # categorical data (only object columns)

Unnamed: 0,sex,embarked,deck
count,891,889,203
unique,2,3,7
top,male,S,C
freq,577,644,59


In [11]:
float_summary = titanic.describe(include=["float64"])
print("\nSummary of Float Columns:")
print(float_summary)


Summary of Float Columns:
              age        fare
count  714.000000  891.000000
mean    29.699118   32.204208
std     14.526497   49.693429
min      0.420000    0.000000
25%     20.125000    7.910400
50%     28.000000   14.454200
75%     38.000000   31.000000
max     80.000000  512.329200


In [13]:
float_summary = titanic.describe(include=["int64"])
print("\nSummary of Int Columns:")
print(float_summary)


Summary of Int Columns:
         survived      pclass       sibsp       parch
count  891.000000  891.000000  891.000000  891.000000
mean     0.383838    2.308642    0.523008    0.381594
std      0.486592    0.836071    1.102743    0.806057
min      0.000000    1.000000    0.000000    0.000000
25%      0.000000    2.000000    0.000000    0.000000
50%      0.000000    3.000000    0.000000    0.000000
75%      1.000000    3.000000    1.000000    0.000000
max      1.000000    3.000000    8.000000    6.000000


### Python Built-in Functions & DataFrame Attributes and Methods

In [14]:
titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.2500,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.9250,S,
3,1,1,female,35.0,1,0,53.1000,S,C
4,0,3,male,35.0,0,0,8.0500,S,
...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,
887,1,1,female,19.0,0,0,30.0000,S,B
888,0,3,female,,1,2,23.4500,S,
889,1,1,male,26.0,0,0,30.0000,C,C


#### DataFrames and Python Built-in Functions

In [15]:
type(titanic)

pandas.core.frame.DataFrame

In [16]:
len(titanic)

891

In [17]:
round(titanic, 0)
# The round() function in pandas is used to round numerical values
#  in a DataFrame to a specified number of decimal places.



Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.0,S,
1,1,1,female,38.0,1,0,71.0,C,C
2,1,3,female,26.0,0,0,8.0,S,
3,1,1,female,35.0,1,0,53.0,S,C
4,0,3,male,35.0,0,0,8.0,S,
...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0,S,
887,1,1,female,19.0,0,0,30.0,S,B
888,0,3,female,,1,2,23.0,S,
889,1,1,male,26.0,0,0,30.0,C,C


In [None]:
#int(titanic)

In [None]:
min(titanic) # to be validated

'age'

#### DataFrame Attributes

In [20]:
titanic.shape

(891, 9)

In [21]:
titanic.size

8019

In [22]:
titanic.index

RangeIndex(start=0, stop=891, step=1)

In [23]:
titanic.columns

Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
       'embarked', 'deck'],
      dtype='object')

#### DataFrame Methods

In [24]:
titanic.head(n = 2)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.25,S,
1,1,1,female,38.0,1,0,71.2833,C,C


In [25]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   survived  891 non-null    int64  
 1   pclass    891 non-null    int64  
 2   sex       891 non-null    object 
 3   age       714 non-null    float64
 4   sibsp     891 non-null    int64  
 5   parch     891 non-null    int64  
 6   fare      891 non-null    float64
 7   embarked  889 non-null    object 
 8   deck      203 non-null    object 
dtypes: float64(2), int64(4), object(3)
memory usage: 62.8+ KB


In [None]:
# titanic.min() # old

In [30]:
titanic.min(numeric_only=True)  # new

survived    0.00
pclass      1.00
age         0.42
sibsp       0.00
parch       0.00
fare        0.00
dtype: float64

#### Method Chaining

In [None]:
# titanic.mean().sort_values().head(2) # old

In [31]:
titanic.mean(numeric_only = True).sort_values().head(2) # new

parch       0.381594
survived    0.383838
dtype: float64

### Selecting Columns

In [None]:
titanic

pandas.core.frame.DataFrame

In [33]:
titanic["age"]

0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
       ... 
886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: age, Length: 891, dtype: float64

In [34]:
type(titanic["age"])

pandas.core.series.Series

In [None]:
#titanic["age", "sex"]

In [36]:
titanic[["age", "sex"]]

Unnamed: 0,age,sex
0,22.0,male
1,38.0,female
2,26.0,female
3,35.0,female
4,35.0,male
...,...,...
886,27.0,male
887,19.0,female
888,,female
889,26.0,male


In [37]:
type(titanic[["age", "sex"]])

pandas.core.frame.DataFrame

In [38]:
titanic[["sex", "age"]]

Unnamed: 0,sex,age
0,male,22.0
1,female,38.0
2,female,26.0
3,female,35.0
4,male,35.0
...,...,...
886,male,27.0
887,female,19.0
888,female,
889,male,26.0


In [None]:
titanic[["sex", "age", "fare"]]

KeyError: "['h'] not in index"

In [40]:
type(titanic[["age"]])

pandas.core.frame.DataFrame

### Selecting one Column with "dot notation"

In [42]:
titanic.age

0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
       ... 
886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: age, Length: 891, dtype: float64

In [43]:
titanic.age.equals(titanic["age"])

True

In [44]:
titanic.embarked

0      S
1      C
2      S
3      S
4      S
      ..
886    S
887    S
888    S
889    C
890    Q
Name: embarked, Length: 891, dtype: object

### Position-based Indexing and Slicing with iloc[]

### **What is `iloc()` in Pandas?**

- **`iloc()`** stands for "integer-location based indexing."
- It is used to select rows and columns in a Pandas DataFrame or Series by their **integer positions** (index numbers).

---

### **Key Points about `iloc()`**

1. **Indexes by Position**:
   - Rows and columns are selected using their position (starting from 0).

2. **Syntax**:
   ```python
   DataFrame.iloc[row_position, column_position]


In [45]:
import pandas as pd

In [46]:
summer = pd.read_csv("summer.csv", index_col = "Athlete")

In [47]:
summer

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver
...,...,...,...,...,...,...,...,...
"JANIKOWSKI, Damian",2012,London,Wrestling,Wrestling Freestyle,POL,Men,Wg 84 KG,Bronze
"REZAEI, Ghasem Gholamreza",2012,London,Wrestling,Wrestling Freestyle,IRI,Men,Wg 96 KG,Gold
"TOTROV, Rustam",2012,London,Wrestling,Wrestling Freestyle,RUS,Men,Wg 96 KG,Silver
"ALEKSANYAN, Artur",2012,London,Wrestling,Wrestling Freestyle,ARM,Men,Wg 96 KG,Bronze


In [48]:
summer.info()

<class 'pandas.core.frame.DataFrame'>
Index: 31165 entries, HAJOS, Alfred to LIDBERG, Jimmy
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Year        31165 non-null  int64 
 1   City        31165 non-null  object
 2   Sport       31165 non-null  object
 3   Discipline  31165 non-null  object
 4   Country     31161 non-null  object
 5   Gender      31165 non-null  object
 6   Event       31165 non-null  object
 7   Medal       31165 non-null  object
dtypes: int64(1), object(7)
memory usage: 2.1+ MB


#### Selecting Rows with iloc[]

In [49]:
summer.iloc[0]  # selects the first row of the summer DataFrame.

Year                    1896
City                  Athens
Sport               Aquatics
Discipline          Swimming
Country                  HUN
Gender                   Men
Event         100M Freestyle
Medal                   Gold
Name: HAJOS, Alfred, dtype: object

In [50]:
type(summer.iloc[0])
# Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal

pandas.core.series.Series

In [51]:
summer.iloc[1]

Year                    1896
City                  Athens
Sport               Aquatics
Discipline          Swimming
Country                  AUT
Gender                   Men
Event         100M Freestyle
Medal                 Silver
Name: HERSCHMANN, Otto, dtype: object

In [52]:
summer.iloc[-1]

Year                         2012
City                       London
Sport                   Wrestling
Discipline    Wrestling Freestyle
Country                       SWE
Gender                        Men
Event                    Wg 96 KG
Medal                      Bronze
Name: LIDBERG, Jimmy, dtype: object

In [53]:
summer.iloc[[1, 2, 3]]  # select rows at index positions 1, 2, and 3.

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold


In [54]:
summer.iloc[1:4]

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold


In [55]:
summer.iloc[:5]

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver


In [56]:
summer.iloc[-5:]  # selects the last 5 rows f

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"JANIKOWSKI, Damian",2012,London,Wrestling,Wrestling Freestyle,POL,Men,Wg 84 KG,Bronze
"REZAEI, Ghasem Gholamreza",2012,London,Wrestling,Wrestling Freestyle,IRI,Men,Wg 96 KG,Gold
"TOTROV, Rustam",2012,London,Wrestling,Wrestling Freestyle,RUS,Men,Wg 96 KG,Silver
"ALEKSANYAN, Artur",2012,London,Wrestling,Wrestling Freestyle,ARM,Men,Wg 96 KG,Bronze
"LIDBERG, Jimmy",2012,London,Wrestling,Wrestling Freestyle,SWE,Men,Wg 96 KG,Bronze


In [57]:
summer.iloc[:]

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver
...,...,...,...,...,...,...,...,...
"JANIKOWSKI, Damian",2012,London,Wrestling,Wrestling Freestyle,POL,Men,Wg 84 KG,Bronze
"REZAEI, Ghasem Gholamreza",2012,London,Wrestling,Wrestling Freestyle,IRI,Men,Wg 96 KG,Gold
"TOTROV, Rustam",2012,London,Wrestling,Wrestling Freestyle,RUS,Men,Wg 96 KG,Silver
"ALEKSANYAN, Artur",2012,London,Wrestling,Wrestling Freestyle,ARM,Men,Wg 96 KG,Bronze


In [58]:
summer.iloc[[2, 45, 5467]]  # selects the 3rd, 46th, and 5468th rows

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"PERSAKIS, Ioannis",1896,Athens,Athletics,Athletics,GRE,Men,Triple Jump,Bronze
"POLAK, Ans",1928,Amsterdam,Gymnastics,Artistic G.,NED,Women,Team Competition,Gold


#### Indexing/Slicing Rows and Columns with iloc[]

In [59]:
summer.head(10)

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver
"CHOROPHAS, Efstathios",1896,Athens,Aquatics,Swimming,GRE,Men,1200M Freestyle,Bronze
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,1200M Freestyle,Gold
"ANDREOU, Joannis",1896,Athens,Aquatics,Swimming,GRE,Men,1200M Freestyle,Silver
"CHOROPHAS, Efstathios",1896,Athens,Aquatics,Swimming,GRE,Men,400M Freestyle,Bronze
"NEUMANN, Paul",1896,Athens,Aquatics,Swimming,AUT,Men,400M Freestyle,Gold


In [60]:
summer.iloc[0, 4]
# 0: The first row (index position 0).
# 4: The fifth column (index position 4), since Python uses zero-based indexing (i.e., the first column is index 0, the second column is index 1, and so on).
# Year, City, Sport, Discipline, Athlete, Country, Gender, Event, Medal

'HUN'

In [61]:
summer.iloc[0, :3]
# 0: This selects the first row (index 0).
# :3: This selects the first three columns (indices 0, 1, and 2).

Year         1896
City       Athens
Sport    Aquatics
Name: HAJOS, Alfred, dtype: object

In [None]:
summer.iloc[0, [0, 2, 5, 7]]


ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

In [None]:
summer.iloc[34:39, [0, 2, 5, 7]]

#### Selecting Columns with iloc[]

In [63]:
summer.iloc[:, 4].equals(summer.Country)

True

In [64]:
summer["Country"]

Athlete
HAJOS, Alfred                HUN
HERSCHMANN, Otto             AUT
DRIVAS, Dimitrios            GRE
MALOKINIS, Ioannis           GRE
CHASAPIS, Spiridon           GRE
                            ... 
JANIKOWSKI, Damian           POL
REZAEI, Ghasem Gholamreza    IRI
TOTROV, Rustam               RUS
ALEKSANYAN, Artur            ARM
LIDBERG, Jimmy               SWE
Name: Country, Length: 31165, dtype: object

### Label-based Indexing and Slicing with loc[] 

## What is `loc()`?

In **Pandas**, `loc()` is a method used to **select rows and columns** from a dataset (also called a **DataFrame**) based on their **labels** (names) rather than their **position** (index numbers). It’s a way of accessing data in a dataset by using the actual **names** of rows or columns, instead of the position numbers.

### Why Use `loc()`?

Think of it like a **library catalog**. If you want to find a book, you can look it up by its **title** or **author** (label), rather than by its **location** (position). In a similar way, `loc()` allows you to access specific rows and columns in a dataset using their names.

## How Does `loc()` Work?

- **Rows**: With `loc()`, you can select a specific row by using its **label** (not its position).
- **Columns**: You can also select specific columns by their **name** (not position).

### Syntax:
```python
dataframe.loc[row_label, column_label]


In [65]:
summer

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver
...,...,...,...,...,...,...,...,...
"JANIKOWSKI, Damian",2012,London,Wrestling,Wrestling Freestyle,POL,Men,Wg 84 KG,Bronze
"REZAEI, Ghasem Gholamreza",2012,London,Wrestling,Wrestling Freestyle,IRI,Men,Wg 96 KG,Gold
"TOTROV, Rustam",2012,London,Wrestling,Wrestling Freestyle,RUS,Men,Wg 96 KG,Silver
"ALEKSANYAN, Artur",2012,London,Wrestling,Wrestling Freestyle,ARM,Men,Wg 96 KG,Bronze


#### Selecting Rows with loc[]

In [66]:
summer.iloc[2]

Year                                1896
City                              Athens
Sport                           Aquatics
Discipline                      Swimming
Country                              GRE
Gender                               Men
Event         100M Freestyle For Sailors
Medal                             Bronze
Name: DRIVAS, Dimitrios, dtype: object

In [67]:
summer.loc["DRIVAS, Dimitrios"]

Year                                1896
City                              Athens
Sport                           Aquatics
Discipline                      Swimming
Country                              GRE
Gender                               Men
Event         100M Freestyle For Sailors
Medal                             Bronze
Name: DRIVAS, Dimitrios, dtype: object

In [68]:
summer.loc["PHELPS, Michael"]

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"PHELPS, Michael",2004,Athens,Aquatics,Swimming,USA,Men,100M Butterfly,Gold
"PHELPS, Michael",2004,Athens,Aquatics,Swimming,USA,Men,200M Butterfly,Gold
"PHELPS, Michael",2004,Athens,Aquatics,Swimming,USA,Men,200M Freestyle,Bronze
"PHELPS, Michael",2004,Athens,Aquatics,Swimming,USA,Men,200M Individual Medley,Gold
"PHELPS, Michael",2004,Athens,Aquatics,Swimming,USA,Men,400M Individual Medley,Gold
"PHELPS, Michael",2004,Athens,Aquatics,Swimming,USA,Men,4X100M Freestyle Relay,Bronze
"PHELPS, Michael",2004,Athens,Aquatics,Swimming,USA,Men,4X100M Medley Relay,Gold
"PHELPS, Michael",2004,Athens,Aquatics,Swimming,USA,Men,4X200M Freestyle Relay,Gold
"PHELPS, Michael",2008,Beijing,Aquatics,Swimming,USA,Men,100M Butterfly,Gold
"PHELPS, Michael",2008,Beijing,Aquatics,Swimming,USA,Men,200M Butterfly,Gold


#### Indexing/Slicing Rows and Columns with loc[]

In [69]:
summer.loc["PHELPS, Michael", "Medal"]

Athlete
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael    Bronze
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael    Bronze
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael      Gold
PHELPS, Michael    Silver
PHELPS, Michael      Gold
PHELPS, Michael    Silver
PHELPS, Michael      Gold
PHELPS, Michael      Gold
Name: Medal, dtype: object

In [70]:
summer.loc["PHELPS, Michael", ["Medal", "Event"]]

Unnamed: 0_level_0,Medal,Event
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1
"PHELPS, Michael",Gold,100M Butterfly
"PHELPS, Michael",Gold,200M Butterfly
"PHELPS, Michael",Bronze,200M Freestyle
"PHELPS, Michael",Gold,200M Individual Medley
"PHELPS, Michael",Gold,400M Individual Medley
"PHELPS, Michael",Bronze,4X100M Freestyle Relay
"PHELPS, Michael",Gold,4X100M Medley Relay
"PHELPS, Michael",Gold,4X200M Freestyle Relay
"PHELPS, Michael",Gold,100M Butterfly
"PHELPS, Michael",Gold,200M Butterfly


In [72]:
summer.loc[["PHELPS, Michael", "LEWIS, Carl"], ["Medal", "Event"]]

Unnamed: 0_level_0,Medal,Event
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1
"PHELPS, Michael",Gold,100M Butterfly
"PHELPS, Michael",Gold,200M Butterfly
"PHELPS, Michael",Bronze,200M Freestyle
"PHELPS, Michael",Gold,200M Individual Medley
"PHELPS, Michael",Gold,400M Individual Medley
"PHELPS, Michael",Bronze,4X100M Freestyle Relay
"PHELPS, Michael",Gold,4X100M Medley Relay
"PHELPS, Michael",Gold,4X200M Freestyle Relay
"PHELPS, Michael",Gold,100M Butterfly
"PHELPS, Michael",Gold,200M Butterfly


In [73]:
summer.loc[:, ["Medal", "Event"]]

Unnamed: 0_level_0,Medal,Event
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1
"HAJOS, Alfred",Gold,100M Freestyle
"HERSCHMANN, Otto",Silver,100M Freestyle
"DRIVAS, Dimitrios",Bronze,100M Freestyle For Sailors
"MALOKINIS, Ioannis",Gold,100M Freestyle For Sailors
"CHASAPIS, Spiridon",Silver,100M Freestyle For Sailors
...,...,...
"JANIKOWSKI, Damian",Bronze,Wg 84 KG
"REZAEI, Ghasem Gholamreza",Gold,Wg 96 KG
"TOTROV, Rustam",Silver,Wg 96 KG
"ALEKSANYAN, Artur",Bronze,Wg 96 KG


In [74]:
summer.head(10)

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver
"CHOROPHAS, Efstathios",1896,Athens,Aquatics,Swimming,GRE,Men,1200M Freestyle,Bronze
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,1200M Freestyle,Gold
"ANDREOU, Joannis",1896,Athens,Aquatics,Swimming,GRE,Men,1200M Freestyle,Silver
"CHOROPHAS, Efstathios",1896,Athens,Aquatics,Swimming,GRE,Men,400M Freestyle,Bronze
"NEUMANN, Paul",1896,Athens,Aquatics,Swimming,AUT,Men,400M Freestyle,Gold


In [75]:
summer.loc[:"CHASAPIS, Spiridon"]

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver


In [None]:
# summer.loc[:"PHELPS, Michael"]

In [None]:
# summer.loc["PHELPS, Michael":]Cannot get left slice bound for non-unique label: 'PHELPS, Michael'

KeyError: "Cannot get left slice bound for non-unique label: 'PHELPS, Michael'"

In [77]:
summer.head(20)

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver
"CHOROPHAS, Efstathios",1896,Athens,Aquatics,Swimming,GRE,Men,1200M Freestyle,Bronze
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,1200M Freestyle,Gold
"ANDREOU, Joannis",1896,Athens,Aquatics,Swimming,GRE,Men,1200M Freestyle,Silver
"CHOROPHAS, Efstathios",1896,Athens,Aquatics,Swimming,GRE,Men,400M Freestyle,Bronze
"NEUMANN, Paul",1896,Athens,Aquatics,Swimming,AUT,Men,400M Freestyle,Gold


In [78]:
summer.loc["DRIVAS, Dimitrios":"BLAKE, Arthur", "City":"Discipline"]

Unnamed: 0_level_0,City,Sport,Discipline
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"DRIVAS, Dimitrios",Athens,Aquatics,Swimming
"MALOKINIS, Ioannis",Athens,Aquatics,Swimming
"CHASAPIS, Spiridon",Athens,Aquatics,Swimming
"CHOROPHAS, Efstathios",Athens,Aquatics,Swimming
"HAJOS, Alfred",Athens,Aquatics,Swimming
"ANDREOU, Joannis",Athens,Aquatics,Swimming
"CHOROPHAS, Efstathios",Athens,Aquatics,Swimming
"NEUMANN, Paul",Athens,Aquatics,Swimming
"PEPANOS, Antonios",Athens,Aquatics,Swimming
"LANE, Francis",Athens,Athletics,Athletics


In [None]:
#summer.loc[["PHELPS, Michael", "DUCK, Donald"]]

In [None]:
#summer.loc["PHELPS, Michael", ["Year", "Age"]]

# Key Differences Between `loc[]` and `iloc[]`

| Feature                | `loc[]` (Label-based)                         | `iloc[]` (Integer-based)                        |
|------------------------|----------------------------------------------|------------------------------------------------|
| **Indexing Type**       | Uses labels (index/column names)             | Uses integer positions (0-based index)         |
| **Slicing Behavior**    | Inclusive of the end value (`end_label`)      | Exclusive of the end value (`end_position`)    |
| **Row Selection**       | `df.loc['row_label']`                        | `df.iloc[row_position]`                        |
| **Column Selection**    | `df.loc[:, 'column_label']`                  | `df.iloc[:, column_position]`                  |
| **Conditional Selection**| Supports label-based boolean indexing        | Does not support label-based boolean indexing  |

## When to Use:
- **Use `loc[]`** when you want to access rows and columns using **labels** (like `'city'`, `'age'`).
- **Use `iloc[]`** when you want to access rows and columns by their **integer positions** (e.g., the 3rd row or the 2nd column).


### Indexing and Slicing with reindex()

In [92]:
import pandas as pd

In [93]:
summer = pd.read_csv("summer.csv")

In [94]:
summer

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver
...,...,...,...,...,...,...,...,...,...
31160,2012,London,Wrestling,Wrestling Freestyle,"JANIKOWSKI, Damian",POL,Men,Wg 84 KG,Bronze
31161,2012,London,Wrestling,Wrestling Freestyle,"REZAEI, Ghasem Gholamreza",IRI,Men,Wg 96 KG,Gold
31162,2012,London,Wrestling,Wrestling Freestyle,"TOTROV, Rustam",RUS,Men,Wg 96 KG,Silver
31163,2012,London,Wrestling,Wrestling Freestyle,"ALEKSANYAN, Artur",ARM,Men,Wg 96 KG,Bronze


In [99]:
summer1=summer.iloc[0:5,[0, 5]]
print(summer1)

   Year Country
0  1896     HUN
1  1896     AUT
2  1896     GRE
3  1896     GRE
4  1896     GRE


In [None]:
summer.reindex(index = [0, 5, 30000, 40000], columns =  ["Athlete", "Medal", "Age"])
summer.reindex(index=[0, 5, 30000, 40000], columns=["Athlete", "Medal", "Age"])

# reindex() returns a new DataFrame with the specified new row and column labels. 
# The original DataFrame remains unchanged.

Unnamed: 0,Athlete,Medal,Age
0,"HAJOS, Alfred",Gold,
5,"CHOROPHAS, Efstathios",Bronze,
30000,"PAUTARAN, Maryna",Bronze,
40000,,,


In [104]:
summer = pd.read_csv("summer.csv", index_col = "Athlete")

In [105]:
summer

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver
...,...,...,...,...,...,...,...,...
"JANIKOWSKI, Damian",2012,London,Wrestling,Wrestling Freestyle,POL,Men,Wg 84 KG,Bronze
"REZAEI, Ghasem Gholamreza",2012,London,Wrestling,Wrestling Freestyle,IRI,Men,Wg 96 KG,Gold
"TOTROV, Rustam",2012,London,Wrestling,Wrestling Freestyle,RUS,Men,Wg 96 KG,Silver
"ALEKSANYAN, Artur",2012,London,Wrestling,Wrestling Freestyle,ARM,Men,Wg 96 KG,Bronze


In [91]:
summer.reindex(columns = ["Medal", "Age"])
# summer(columns=["Medal", "Age"])

Unnamed: 0,Medal,Age
0,Gold,
1,Silver,
2,Bronze,
3,Gold,
4,Silver,
...,...,...
31160,Bronze,
31161,Gold,
31162,Silver,
31163,Bronze,


In [None]:
#summer.reindex(index = ["PHELPS, Michael"], columns = ["Medal", "Age"])

### Summary and Outlook

#### Importing from CSV and first Inspection

In [106]:
import pandas as pd

In [107]:
summer = pd.read_csv("summer.csv", index_col = "Athlete")

In [108]:
summer

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver
...,...,...,...,...,...,...,...,...
"JANIKOWSKI, Damian",2012,London,Wrestling,Wrestling Freestyle,POL,Men,Wg 84 KG,Bronze
"REZAEI, Ghasem Gholamreza",2012,London,Wrestling,Wrestling Freestyle,IRI,Men,Wg 96 KG,Gold
"TOTROV, Rustam",2012,London,Wrestling,Wrestling Freestyle,RUS,Men,Wg 96 KG,Silver
"ALEKSANYAN, Artur",2012,London,Wrestling,Wrestling Freestyle,ARM,Men,Wg 96 KG,Bronze


In [109]:
summer.info()

<class 'pandas.core.frame.DataFrame'>
Index: 31165 entries, HAJOS, Alfred to LIDBERG, Jimmy
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Year        31165 non-null  int64 
 1   City        31165 non-null  object
 2   Sport       31165 non-null  object
 3   Discipline  31165 non-null  object
 4   Country     31161 non-null  object
 5   Gender      31165 non-null  object
 6   Event       31165 non-null  object
 7   Medal       31165 non-null  object
dtypes: int64(1), object(7)
memory usage: 2.1+ MB


#### Selecting one Column

In [110]:
summer.Medal

Athlete
HAJOS, Alfred                  Gold
HERSCHMANN, Otto             Silver
DRIVAS, Dimitrios            Bronze
MALOKINIS, Ioannis             Gold
CHASAPIS, Spiridon           Silver
                              ...  
JANIKOWSKI, Damian           Bronze
REZAEI, Ghasem Gholamreza      Gold
TOTROV, Rustam               Silver
ALEKSANYAN, Artur            Bronze
LIDBERG, Jimmy               Bronze
Name: Medal, Length: 31165, dtype: object

In [111]:
summer["Medal"]

Athlete
HAJOS, Alfred                  Gold
HERSCHMANN, Otto             Silver
DRIVAS, Dimitrios            Bronze
MALOKINIS, Ioannis             Gold
CHASAPIS, Spiridon           Silver
                              ...  
JANIKOWSKI, Damian           Bronze
REZAEI, Ghasem Gholamreza      Gold
TOTROV, Rustam               Silver
ALEKSANYAN, Artur            Bronze
LIDBERG, Jimmy               Bronze
Name: Medal, Length: 31165, dtype: object

#### Selecting multiple Columns

In [112]:
summer[["Year", "Medal"]]

Unnamed: 0_level_0,Year,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1
"HAJOS, Alfred",1896,Gold
"HERSCHMANN, Otto",1896,Silver
"DRIVAS, Dimitrios",1896,Bronze
"MALOKINIS, Ioannis",1896,Gold
"CHASAPIS, Spiridon",1896,Silver
...,...,...
"JANIKOWSKI, Damian",2012,Bronze
"REZAEI, Ghasem Gholamreza",2012,Gold
"TOTROV, Rustam",2012,Silver
"ALEKSANYAN, Artur",2012,Bronze


In [113]:
summer.loc[:, ["Year", "Medal"]]

Unnamed: 0_level_0,Year,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1
"HAJOS, Alfred",1896,Gold
"HERSCHMANN, Otto",1896,Silver
"DRIVAS, Dimitrios",1896,Bronze
"MALOKINIS, Ioannis",1896,Gold
"CHASAPIS, Spiridon",1896,Silver
...,...,...
"JANIKOWSKI, Damian",2012,Bronze
"REZAEI, Ghasem Gholamreza",2012,Gold
"TOTROV, Rustam",2012,Silver
"ALEKSANYAN, Artur",2012,Bronze


#### Selecting positional rows

In [114]:
summer.iloc[10:21]

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"PEPANOS, Antonios",1896,Athens,Aquatics,Swimming,GRE,Men,400M Freestyle,Silver
"LANE, Francis",1896,Athens,Athletics,Athletics,USA,Men,100M,Bronze
"SZOKOLYI, Alajos",1896,Athens,Athletics,Athletics,HUN,Men,100M,Bronze
"BURKE, Thomas",1896,Athens,Athletics,Athletics,USA,Men,100M,Gold
"HOFMANN, Fritz",1896,Athens,Athletics,Athletics,GER,Men,100M,Silver
"CURTIS, Thomas",1896,Athens,Athletics,Athletics,USA,Men,110M Hurdles,Gold
"GOULDING, Grantley",1896,Athens,Athletics,Athletics,GBR,Men,110M Hurdles,Silver
"LERMUSIAUX, Albin",1896,Athens,Athletics,Athletics,FRA,Men,1500M,Bronze
"FLACK, Edwin",1896,Athens,Athletics,Athletics,AUS,Men,1500M,Gold
"BLAKE, Arthur",1896,Athens,Athletics,Athletics,USA,Men,1500M,Silver


#### Selecting labeled rows

In [115]:
summer.loc["LEWIS, Carl"]

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"LEWIS, Carl",1984,Los Angeles,Athletics,Athletics,USA,Men,100M,Gold
"LEWIS, Carl",1984,Los Angeles,Athletics,Athletics,USA,Men,200M,Gold
"LEWIS, Carl",1984,Los Angeles,Athletics,Athletics,USA,Men,4X100M Relay,Gold
"LEWIS, Carl",1984,Los Angeles,Athletics,Athletics,USA,Men,Long Jump,Gold
"LEWIS, Carl",1988,Seoul,Athletics,Athletics,USA,Men,100M,Gold
"LEWIS, Carl",1988,Seoul,Athletics,Athletics,USA,Men,200M,Silver
"LEWIS, Carl",1988,Seoul,Athletics,Athletics,USA,Men,Long Jump,Gold
"LEWIS, Carl",1992,Barcelona,Athletics,Athletics,USA,Men,4X100M Relay,Gold
"LEWIS, Carl",1992,Barcelona,Athletics,Athletics,USA,Men,Long Jump,Gold
"LEWIS, Carl",1996,Atlanta,Athletics,Athletics,USA,Men,Long Jump,Gold


#### Putting it all together

In [116]:
summer[["Year", "Event", "Medal"]].loc["LEWIS, Carl"]

Unnamed: 0_level_0,Year,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"LEWIS, Carl",1984,100M,Gold
"LEWIS, Carl",1984,200M,Gold
"LEWIS, Carl",1984,4X100M Relay,Gold
"LEWIS, Carl",1984,Long Jump,Gold
"LEWIS, Carl",1988,100M,Gold
"LEWIS, Carl",1988,200M,Silver
"LEWIS, Carl",1988,Long Jump,Gold
"LEWIS, Carl",1992,4X100M Relay,Gold
"LEWIS, Carl",1992,Long Jump,Gold
"LEWIS, Carl",1996,Long Jump,Gold


In [117]:
summer.loc["LEWIS, Carl"][["Year", "Event", "Medal"]]

Unnamed: 0_level_0,Year,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"LEWIS, Carl",1984,100M,Gold
"LEWIS, Carl",1984,200M,Gold
"LEWIS, Carl",1984,4X100M Relay,Gold
"LEWIS, Carl",1984,Long Jump,Gold
"LEWIS, Carl",1988,100M,Gold
"LEWIS, Carl",1988,200M,Silver
"LEWIS, Carl",1988,Long Jump,Gold
"LEWIS, Carl",1992,4X100M Relay,Gold
"LEWIS, Carl",1992,Long Jump,Gold
"LEWIS, Carl",1996,Long Jump,Gold


In [None]:
summer.loc["LEWIS, Carl", ["Year", "Event", "Medal"]]

#### Outlook Pandas Objects

In [118]:
summer

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver
...,...,...,...,...,...,...,...,...
"JANIKOWSKI, Damian",2012,London,Wrestling,Wrestling Freestyle,POL,Men,Wg 84 KG,Bronze
"REZAEI, Ghasem Gholamreza",2012,London,Wrestling,Wrestling Freestyle,IRI,Men,Wg 96 KG,Gold
"TOTROV, Rustam",2012,London,Wrestling,Wrestling Freestyle,RUS,Men,Wg 96 KG,Silver
"ALEKSANYAN, Artur",2012,London,Wrestling,Wrestling Freestyle,ARM,Men,Wg 96 KG,Bronze


In [119]:
type(summer)

pandas.core.frame.DataFrame

In [120]:
summer["Year"]

Athlete
HAJOS, Alfred                1896
HERSCHMANN, Otto             1896
DRIVAS, Dimitrios            1896
MALOKINIS, Ioannis           1896
CHASAPIS, Spiridon           1896
                             ... 
JANIKOWSKI, Damian           2012
REZAEI, Ghasem Gholamreza    2012
TOTROV, Rustam               2012
ALEKSANYAN, Artur            2012
LIDBERG, Jimmy               2012
Name: Year, Length: 31165, dtype: int64

In [121]:
type(summer["Year"])

pandas.core.series.Series

In [122]:
summer.columns

Index(['Year', 'City', 'Sport', 'Discipline', 'Country', 'Gender', 'Event',
       'Medal'],
      dtype='object')

In [123]:
type(summer.columns)

pandas.core.indexes.base.Index

In [124]:
summer.index

Index(['HAJOS, Alfred', 'HERSCHMANN, Otto', 'DRIVAS, Dimitrios',
       'MALOKINIS, Ioannis', 'CHASAPIS, Spiridon', 'CHOROPHAS, Efstathios',
       'HAJOS, Alfred', 'ANDREOU, Joannis', 'CHOROPHAS, Efstathios',
       'NEUMANN, Paul',
       ...
       'AHMADOV, Emin', 'KAZAKEVIC, Aleksandr', 'KHUGAEV, Alan',
       'EBRAHIM, Karam Mohamed Gaber', 'GAJIYEV, Danyal', 'JANIKOWSKI, Damian',
       'REZAEI, Ghasem Gholamreza', 'TOTROV, Rustam', 'ALEKSANYAN, Artur',
       'LIDBERG, Jimmy'],
      dtype='object', name='Athlete', length=31165)

In [125]:
type(summer.index)

pandas.core.indexes.base.Index

### Advanced Indexing and Slicing (optional)

In [126]:
import pandas as pd

In [127]:
summer = pd.read_csv("summer.csv")

In [128]:
summer

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver
...,...,...,...,...,...,...,...,...,...
31160,2012,London,Wrestling,Wrestling Freestyle,"JANIKOWSKI, Damian",POL,Men,Wg 84 KG,Bronze
31161,2012,London,Wrestling,Wrestling Freestyle,"REZAEI, Ghasem Gholamreza",IRI,Men,Wg 96 KG,Gold
31162,2012,London,Wrestling,Wrestling Freestyle,"TOTROV, Rustam",RUS,Men,Wg 96 KG,Silver
31163,2012,London,Wrestling,Wrestling Freestyle,"ALEKSANYAN, Artur",ARM,Men,Wg 96 KG,Bronze


__Case 1: Getting the first 5 rows and rows 354 and 765__

In [129]:
rows = list(range(5)) + [354, 765]
rows

[0, 1, 2, 3, 4, 354, 765]

In [130]:
summer.iloc[rows]

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver
354,1900,Paris,Equestrian,Jumping,DE BELLEGARDE,FRA,Men,Long Jump Individual,Bronze
765,1904,St Louis,Athletics,Athletics,"CORAY, Albert",ZZX,Men,4Miles Team,Silver


__Case 2: Getting the first three columns and the columns "Gender" and "Event"__

In [131]:
summer.columns[:3].to_list() + ["Gender", "Event"]

['Year', 'City', 'Sport', 'Gender', 'Event']

In [132]:
col = summer.columns[:3].to_list() + ["Gender", "Event"]
col

['Year', 'City', 'Sport', 'Gender', 'Event']

In [133]:
summer.loc[:, col]

Unnamed: 0,Year,City,Sport,Gender,Event
0,1896,Athens,Aquatics,Men,100M Freestyle
1,1896,Athens,Aquatics,Men,100M Freestyle
2,1896,Athens,Aquatics,Men,100M Freestyle For Sailors
3,1896,Athens,Aquatics,Men,100M Freestyle For Sailors
4,1896,Athens,Aquatics,Men,100M Freestyle For Sailors
...,...,...,...,...,...
31160,2012,London,Wrestling,Men,Wg 84 KG
31161,2012,London,Wrestling,Men,Wg 96 KG
31162,2012,London,Wrestling,Men,Wg 96 KG
31163,2012,London,Wrestling,Men,Wg 96 KG


__Case 3: Combining Position- and label-based Indexing__: Rows at Positions 200 and 300 and columns "Athlete" and "Medal"

In [None]:
summer

In [134]:
summer.loc[[200, 300], ["Athlete", "Medal"]]

Unnamed: 0,Athlete,Medal
200,"KEMP, Peter",Gold
300,"SHELDON, Lewis Pendleton",Bronze


__Case 4: Combining Position- and label-based Indexing__: Rows "PHELPS Michael" and positional columns 4 and 6

In [135]:
summer = pd.read_csv("summer.csv", index_col = "Athlete")

In [None]:
summer

In [136]:
col = summer.columns[[4, 6]]
col

Index(['Country', 'Event'], dtype='object')

In [137]:
summer.loc["PHELPS, Michael", col]

Unnamed: 0_level_0,Country,Event
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1
"PHELPS, Michael",USA,100M Butterfly
"PHELPS, Michael",USA,200M Butterfly
"PHELPS, Michael",USA,200M Freestyle
"PHELPS, Michael",USA,200M Individual Medley
"PHELPS, Michael",USA,400M Individual Medley
"PHELPS, Michael",USA,4X100M Freestyle Relay
"PHELPS, Michael",USA,4X100M Medley Relay
"PHELPS, Michael",USA,4X200M Freestyle Relay
"PHELPS, Michael",USA,100M Butterfly
"PHELPS, Michael",USA,200M Butterfly


In [None]:
#summer.ix["PHELPS, Michael", [4, 6]]