# INDEXING AND SLICING

# Indexing in DataFrames


**1. Label-based Indexing with loc**
* Label-based indexing is used to select data by the labels of rows and columns.

In [9]:
import pandas as pd

data = pd.read_csv(r"c:\Users\Public\titanic_train.csv")
df = pd.DataFrame(data)

# Select a single row by label
print(df.loc[1],"\n")

# Select multiple rows by label
print(df.loc[1:2])

df.loc[0,"Name"]#returns value at index 0 of column name Name
#here slicing is being done for 0 index name



PassengerId                                                    2
Survived                                                       1
Pclass                                                         1
Name           Cumings, Mrs. John Bradley (Florence Briggs Th...
Sex                                                       female
Age                                                         38.0
SibSp                                                          1
Parch                                                          0
Ticket                                                  PC 17599
Fare                                                     71.2833
Cabin                                                        C85
Embarked                                                       C
Name: 1, dtype: object 

   PassengerId  Survived  Pclass  \
1            2         1       1   
2            3         1       3   

                                                Name     Sex   Age  SibSp  \
1  Cuming

'Braund, Mr. Owen Harris'

**2.Positional Indexing with iloc**
* Positional indexing is used to select data by the integer location of rows and columns.

In [12]:
# Select a single row by position
print(df.iloc[1],"\n") # ==>  Selects the second row (index position 1).

# Select multiple rows by position
print(df.iloc[1:3],"\n")  # ==> Selects the second and third rows (index positions 1 and 2).

# Select specific rows and columns by position
print(df.iloc[1:3, 0:2]) # ==> Selects the second and third rows, and the first and second columns.



PassengerId                                                    2
Survived                                                       1
Pclass                                                         1
Name           Cumings, Mrs. John Bradley (Florence Briggs Th...
Sex                                                       female
Age                                                         38.0
SibSp                                                          1
Parch                                                          0
Ticket                                                  PC 17599
Fare                                                     71.2833
Cabin                                                        C85
Embarked                                                       C
Name: 1, dtype: object 

   PassengerId  Survived  Pclass  \
1            2         1       1   
2            3         1       3   

                                                Name     Sex   Age  SibSp  \
1  Cuming

**BOOLEAN INDEXING**
* Boolean indexing allows selection of rows based on a condition.

In [24]:
print(df[df['Age'] > 18])

     PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
..           ...       ...     ...   
885          886         0       3   
886          887         0       2   
887          888         1       1   
889          890         1       1   
890          891         0       3   

                                                  Name     Sex   Age  SibSp  \
0                              Braund, Mr. Owen Harris    male  22.0      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                               Heikkinen, Miss. Laina  female  26.0      0   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                             Allen, Mr. William Henry    male  35.0      0   
..                                                 ...     ...   ... 

In [25]:
# Selecting rows where column Age equals 20 and only columns'Name', 'Fare'
print(df.loc[df['Age'] == 20, ['Name', 'Fare']])


                               Name     Sex
12   Saundercock, Mr. William Henry    male
91       Andreasson, Mr. Paul Edvin    male
113         Jussila, Miss. Katriina  female
131  Coelho, Mr. Domingos Fernandeo    male
378             Betros, Mr. Tannous    male
404         Oreskovic, Miss. Marija  female
441                 Hampe, Mr. Leon    male
622                Nakid, Mr. Sahid    male
640          Jensen, Mr. Hans Peder    male
664     Lindqvist, Mr. Eino William    male
682     Olsvigen, Mr. Thor Anderson    male
725             Oreskovic, Mr. Luka    male
762           Barah, Mr. Hanna Assi    male
840     Alhomaki, Mr. Ilmari Rudolf    male
876   Gustafsson, Mr. Alfred Ossian    male


**A Big Difference Between Loc and Iloc Is That If a Range Is Given Like 0:3 It Includes the 3, Whereas Iloc Doesn't**

In [28]:
mask = df.columns.str.contains('Type')
mask


array([False, False, False, False, False, False, False, False, False,
       False, False, False])

# Slicing in DataFrames

**1. Slicing Rows**
* You can slice rows using label-based or positional indexing.

In [14]:
# Label-based slicing
print(df.loc[1:2],'\n')  # Slicing rows with labels 1 to 2

# Positional slicing
print(df.iloc[0:2])  # Slicing the first two rows


   PassengerId  Survived  Pclass  \
1            2         1       1   
2            3         1       3   

                                                Name     Sex   Age  SibSp  \
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   

   Parch            Ticket     Fare Cabin Embarked  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S   

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   

   Parch     Ticket     Fare Cabin Embarked  
0      0  A/5 21171   7.2500   NaN        S  
1      0   PC 17599  71.2833   C85        C  


**2. Slicing Columns**
* You can also slice columns using label-based or positional indexing.

In [18]:
# Label-based slicing
print(df.loc[:, 'Parch':'Ticket'],"\n")  # Slicing columns Parch  to ticket

# Positional slicing
print(df.iloc[:, 0:2])  # Slicing the first two columns

     Parch            Ticket
0        0         A/5 21171
1        0          PC 17599
2        0  STON/O2. 3101282
3        0            113803
4        0            373450
..     ...               ...
886      0            211536
887      0            112053
888      2        W./C. 6607
889      0            111369
890      0            370376

[891 rows x 2 columns] 

     PassengerId  Survived
0              1         0
1              2         1
2              3         1
3              4         1
4              5         0
..           ...       ...
886          887         0
887          888         1
888          889         0
889          890         1
890          891         0

[891 rows x 2 columns]


**3. Slicing Rows and Columns Together**
* You can combine row and column slicing.


In [23]:
print(df.iloc[1:3, 0:2])  # Slicing rows 1 to 2 and columns 0 to 1


   PassengerId  Survived
1            2         1
2            3         1


# RESHAPING AND TRANSFORMING

# Reshaping

**1.Pivot**
* Pivoting is used to transform or reshape data, converting columns into rows and vice versa.

In [35]:
# Pivot the DataFrame
pivot_df = df.pivot(index='Name', columns='Ticket', values='Cabin')
print(pivot_df)

Ticket                                110152 110413 110465 110564 110813  \
Name                                                                       
Abbing, Mr. Anthony                      NaN    NaN    NaN    NaN    NaN   
Abbott, Mr. Rossmore Edward              NaN    NaN    NaN    NaN    NaN   
Abbott, Mrs. Stanton (Rosa Hunt)         NaN    NaN    NaN    NaN    NaN   
Abelson, Mr. Samuel                      NaN    NaN    NaN    NaN    NaN   
Abelson, Mrs. Samuel (Hannah Wizosky)    NaN    NaN    NaN    NaN    NaN   
...                                      ...    ...    ...    ...    ...   
de Mulder, Mr. Theodore                  NaN    NaN    NaN    NaN    NaN   
de Pelsmaeker, Mr. Alfons                NaN    NaN    NaN    NaN    NaN   
del Carlo, Mr. Sebastiano                NaN    NaN    NaN    NaN    NaN   
van Billiard, Mr. Austin Blyler          NaN    NaN    NaN    NaN    NaN   
van Melkebeke, Mr. Philemon              NaN    NaN    NaN    NaN    NaN   

Ticket     

**2.Melting**
* Melting is used to transform data from a wide format to a long format.

In [38]:
pd.melt(df, id_vars=['PassengerId', 'Name'], var_name='Variable', value_name='Value')

Unnamed: 0,PassengerId,Name,Variable,Value
0,1,"Braund, Mr. Owen Harris",Survived,0
1,2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",Survived,1
2,3,"Heikkinen, Miss. Laina",Survived,1
3,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",Survived,1
4,5,"Allen, Mr. William Henry",Survived,0
...,...,...,...,...
8905,887,"Montvila, Rev. Juozas",Embarked,S
8906,888,"Graham, Miss. Margaret Edith",Embarked,S
8907,889,"Johnston, Miss. Catherine Helen ""Carrie""",Embarked,S
8908,890,"Behr, Mr. Karl Howell",Embarked,C


**3. Stacking and Unstacking**
* Stacking is used to pivot the columns of a DataFrame into the index, while unstacking pivots the index into the columns.

In [40]:
# Stack the DataFrame
stacked_df = pivot_df.stack()
print(stacked_df,"\n")

# Unstack the DataFrame
unstacked_df = stacked_df.unstack()
print(unstacked_df)


Name                                             Ticket  
Allen, Miss. Elisabeth Walton                    24160            B5
Allison, Master. Hudson Trevor                   113781      C22 C26
Allison, Miss. Helen Loraine                     113781      C22 C26
Allison, Mrs. Hudson J C (Bessie Waldo Daniels)  113781      C22 C26
Anderson, Mr. Harry                              19952           E12
                                                              ...   
Wick, Miss. Mary Natalie                         36928            C7
Widener, Mr. Harry Elkins                        113503          C82
Williams-Lambert, Mr. Fletcher Fellows           113510         C128
Woolner, Mr. Hugh                                19947           C52
Young, Miss. Marie Grice                         PC 17760        C32
Length: 204, dtype: object 

Ticket                                          24160   113781 19952 13502  \
Name                                                                        

# TRANSFORMING

**GroupBy and Aggregation**

In [43]:
print(df.groupby(['Name', 'Age'])['Fare'].mean().reset_index())


                                             Name   Age     Fare
0                             Abbing, Mr. Anthony  42.0   7.5500
1                     Abbott, Mr. Rossmore Edward  16.0  20.2500
2                Abbott, Mrs. Stanton (Rosa Hunt)  35.0  20.2500
3                             Abelson, Mr. Samuel  30.0  24.0000
4           Abelson, Mrs. Samuel (Hannah Wizosky)  28.0  24.0000
..                                            ...   ...      ...
709  de Messemaeker, Mrs. Guillaume Joseph (Emma)  36.0  17.4000
710                       de Mulder, Mr. Theodore  30.0   9.5000
711                     de Pelsmaeker, Mr. Alfons  16.0   9.5000
712                     del Carlo, Mr. Sebastiano  29.0  27.7208
713               van Billiard, Mr. Austin Blyler  40.5  14.5000

[714 rows x 3 columns]


**Applying Functions**

In [44]:
df['Age'] = df['Age'].apply(lambda x: 'Adult' if x > 18 else 'Child')
df['Age'] 

0      Adult
1      Adult
2      Adult
3      Adult
4      Adult
       ...  
886    Adult
887    Adult
888    Child
889    Adult
890    Adult
Name: Age, Length: 891, dtype: object

**Duplicates**

In [45]:
deduped_df = df.drop_duplicates(subset=['Age', 'Fare'])
deduped_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,Adult,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,Adult,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,Adult,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,Adult,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,Adult,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
867,868,0,1,"Roebling, Mr. Washington Augustus II",male,Adult,0,0,PC 17590,50.4958,A24,S
872,873,0,1,"Carlsson, Mr. Frans Olof",male,Adult,0,0,695,5.0000,B51 B53 B55,S
876,877,0,3,"Gustafsson, Mr. Alfred Ossian",male,Adult,0,0,7534,9.8458,,S
882,883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,Adult,0,0,7552,10.5167,,S
