**Introduction to DataFrames**
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

Basic Operations:


1.  Creating a dataFrame
2.  Dealing with rows and columns

1.   Indexing and selecting Data
2.   Iterating over rows and columns















In [1]:
#Creating a dataFrame using list
import pandas as pd

# list of strings
lst = ['Apple' ,'Banana', 'Mango' , 'Grapes']

# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)

        0
0   Apple
1  Banana
2   Mango
3  Grapes


In [2]:
#Creating DataFrame from dict of ndarray/lists
import pandas as pd

# intialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
        'Age':[20, 21, 19, 18]}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.
print(df)

    Name  Age
0    Tom   20
1   nick   21
2  krish   19
3   jack   18


**Dealing with rows and columns**
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.


In [3]:
#Columns Selection
import pandas as pd

# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)

# select two columns
print(df[['Name', 'Qualification']])

     Name Qualification
0     Jai           Msc
1  Princi            MA
2  Gaurav           MCA
3    Anuj           Phd


Row Selection
Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.

In [4]:
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']}
df = pd.DataFrame(data)

# Selecting rows using integer indexing with .iloc[]
first_row = df.iloc[0]
rows_2_to_4 = df.iloc[1:4]
specific_rows = df.iloc[[0, 3, 4]]

print("Rows selected using .iloc[]:")
print("First row:\n", first_row)
print("\nRows 2 to 4:\n", rows_2_to_4)
print("\nSpecific rows:\n", specific_rows)

# Selecting rows using label indexing with .loc[]
rows_with_labels = df.loc[[0, 2, 4]]
print("\nRows selected using .loc[]:")
print("Rows with labels 0, 2, and 4:\n", rows_with_labels)

# Selecting rows using boolean indexing
condition = df['Age'] > 30
rows_matching_condition = df[condition]
print("\nRows selected using boolean indexing (Age > 30):\n", rows_matching_condition)

# Selecting rows using query() method
selected_rows = df.query('Age > 30 and City == "New York"')
print("\nRows selected using query() method (Age > 30 and City is New York):\n", selected_rows)

# Selecting rows using isin() method
selected_rows = df[df['Name'].isin(['Alice', 'Charlie'])]
print("\nRows selected using isin() method (Name is Alice or Charlie):\n", selected_rows)


Rows selected using .iloc[]:
First row:
 Name       Alice
Age           25
City    New York
Name: 0, dtype: object

Rows 2 to 4:
       Name  Age         City
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston

Specific rows:
     Name  Age      City
0  Alice   25  New York
3  David   40   Houston
4  Emily   45     Miami

Rows selected using .loc[]:
Rows with labels 0, 2, and 4:
       Name  Age      City
0    Alice   25  New York
2  Charlie   35   Chicago
4    Emily   45     Miami

Rows selected using boolean indexing (Age > 30):
       Name  Age     City
2  Charlie   35  Chicago
3    David   40  Houston
4    Emily   45    Miami

Rows selected using query() method (Age > 30 and City is New York):
 Empty DataFrame
Columns: [Name, Age, City]
Index: []

Rows selected using isin() method (Name is Alice or Charlie):
       Name  Age      City
0    Alice   25  New York
2  Charlie   35   Chicago


Indexing and Selecting Data
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

Indexing a Dataframe using indexing operator [] :
Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing operator to refer to df[].



In [5]:
#Selecting a single column
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']}
df = pd.DataFrame(data)

# Selecting a single column by name
name_column = df['Name']

print("Selected single column 'Name':\n", name_column)

# Selecting a single column using attribute-style access
age_column = df.Age

print("\nSelected single column 'Age' using attribute-style access:\n", age_column)


Selected single column 'Name':
 0      Alice
1        Bob
2    Charlie
3      David
4      Emily
Name: Name, dtype: object

Selected single column 'Age' using attribute-style access:
 0    25
1    30
2    35
3    40
4    45
Name: Age, dtype: int64


In [6]:
#Selecting multiple columns
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']}
df = pd.DataFrame(data)

# Selecting multiple columns by names
selected_columns = df[['Name', 'Age']]

print("Selected columns 'Name' and 'Age':\n", selected_columns)


Selected columns 'Name' and 'Age':
       Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
3    David   40
4    Emily   45


In [8]:
#Indexing
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']}
df = pd.DataFrame(data)

# Setting 'Name' column as index
df.set_index('Name', inplace=True)

# Selecting row by label using .loc[]
row_by_label = df.loc['Alice']
print("Row selected by label ('Alice'):\n", row_by_label)

# Selecting row by position using .iloc[]
row_by_position = df.iloc[0]
print("\nRow selected by position (first row):\n", row_by_position)

# Selecting column by label
column_by_label = df['Age']
print("\nColumn selected by label ('Age'):\n", column_by_label)

# Selecting column by position
column_by_position = df.iloc[:, 1]
print("\nColumn selected by position (second column):\n", column_by_position)

# Selecting subset of rows and columns by label using .loc[]
subset_by_label = df.loc[['Alice', 'Charlie'], ['Age', 'City']]
print("\nSubset selected by label:\n", subset_by_label)

# Selecting subset of rows and columns by position using .iloc[]
subset_by_position = df.iloc[[0, 2], [0, 1]]
print("\nSubset selected by position:\n", subset_by_position)


Row selected by label ('Alice'):
 Age           25
City    New York
Name: Alice, dtype: object

Row selected by position (first row):
 Age           25
City    New York
Name: Alice, dtype: object

Column selected by label ('Age'):
 Name
Alice      25
Bob        30
Charlie    35
David      40
Emily      45
Name: Age, dtype: int64

Column selected by position (second column):
 Name
Alice         New York
Bob        Los Angeles
Charlie        Chicago
David          Houston
Emily            Miami
Name: City, dtype: object

Subset selected by label:
          Age      City
Name                  
Alice     25  New York
Charlie   35   Chicago

Subset selected by position:
          Age      City
Name                  
Alice     25  New York
Charlie   35   Chicago


Iterating over rows and columns
Iteration is a general term for taking each item of something, one after another. Pandas DataFrame consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe like a dictionary.

In [9]:
#Iterating over rows
import pandas as pd

# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary
df = pd.DataFrame(dict)

# iterating over rows using iterrows() function
for i, j in df.iterrows():
    print(i, j)
    print()

0 name      aparna
degree       MBA
score         90
Name: 0, dtype: object

1 name      pankaj
degree       BCA
score         40
Name: 1, dtype: object

2 name      sudhir
degree    M.Tech
score         80
Name: 2, dtype: object

3 name      Geeku
degree      MBA
score        98
Name: 3, dtype: object



In [None]:
#Iterating over columns
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']}
df = pd.DataFrame(data)

# Method 1: Iterating over columns using iteritems()
print("Iterating over columns using iteritems():")
for column_name, column_data in df.iteritems():
    print("Column name:", column_name)
    print("Column data:")
    print(column_data)
    print()

# Method 2: Iterating over columns using a simple loop
print("Iterating over columns using a simple loop:")
for column_name in df.columns:
    print("Column name:", column_name)
    print("Column data:")
    print(df[column_name])
    print()

# Method 3: Iterating over columns using itertuples()
print("Iterating over columns using itertuples():")
for column_tuple in df.itertuples():
    print("Column name:", column_tuple[0])
    print("Column data:")
    print(column_tuple[1:])
    print()
