## 1. How to import files


### List as an example

In [None]:
# importing pandas library
import pandas as pd
 
# creating and initializing a nested list
age_list = [['Afghanistan', 1952, 8.42, 'Asia'],
            ['Australia', 1957, 9.71, 'Oceania'],
            ['Brazil', 1962, 76.04, 'Americas'],
            ['China', 1957, 637.4, 'Asia'],
            ['France', 1957, 44.3, 'Europe'],
            ['India', 1952, 372, 'Asia'],
            ['United States', 1957, 171, 'Americas']]
 
# creating a pandas dataframe
df = pd.DataFrame(age_list, columns=['Country', 'Year',
                                     'Population(million)', 'Continent'])
 
df

In [None]:
df.loc[0:1]

### Basic info

The head/tail/info methods and the dtypes attribute are convenient for a first check.

In [None]:
df.info()

In [None]:
df.dtypes

head and tail

In [None]:
df.head() # 5 by default
df.tail() # 5 by default

In [None]:
# df.describe

In [65]:
df.sort_values(by=['Population(million)'])
#df.sort_values(by=['Population(million)'], ascending=False)

Unnamed: 0,Country,Year,Population(million),Continent
0,Afghanistan,1952,8.42,Asia
1,Australia,1957,9.71,Oceania
4,France,1957,44.3,Europe
2,Brazil,1962,76.04,Americas
6,United States,1957,171.0,Americas
5,India,1952,372.0,Asia
3,China,1957,637.4,Asia


## 2. Subsetting Dataframes

### Columns

In [None]:
df[["Continent", "Year"]]

In [64]:
df.iloc[:, 3]

0        Asia
1     Oceania
2    Americas
3        Asia
4      Europe
5        Asia
6    Americas
Name: Continent, dtype: object

### rows

In [None]:
df[df["Population(million)"] > 100]

In [None]:
df[df["Continent"] == "Asia"]

## 3. Creating a new column from existing column

In [76]:
import pandas as pd

# Creating the dataset with student names
df1 = pd.DataFrame({
    'Student': ['Alice', 'Bob', 'Charlie'],
    'Maths': [9, 8, 7],
    'English': [4, 10, 6],
    'Science': [8, 7, 8],
    'History': [9, 6, 5]
})

In [77]:
# Calculating the total score for each student and adding it as a new column 'Total'
df1['Total'] = df1[['Maths', 'English', 'Science', 'History']].sum(axis=1)

# Display the updated dataset
print(df1)

   Student  Maths  English  Science  History  Total
0    Alice      9        4        8        9     30
1      Bob      8       10        7        6     31
2  Charlie      7        6        8        5     26


## Grouping and Aggregation

Suppose you have a DataFrame df with sales data for different stores and products:

In [106]:
#del df1['Total']

data = {
    'Store': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Product': ['Apples', 'Oranges', 'Bananas', 'Apples', 'Bananas', 'Bananas'],
    'Sales': [10, 15, 9, 20, 14, 25]
}

df2 = pd.DataFrame(data)

In [109]:
df2

Unnamed: 0,Store,Product,Sales
0,A,Apples,10
1,B,Oranges,15
2,A,Bananas,9
3,B,Apples,20
4,A,Bananas,14
5,B,Bananas,25


You can group this data by the 'Store' column:

In [107]:
grouped = df2.groupby('Store')


You can then print out each group:



In [108]:
for name, group in grouped:
    print(f"Store: {name}")
    print(group)

Store: A
  Store  Product  Sales
0     A   Apples     10
2     A  Bananas      9
4     A  Bananas     14
Store: B
  Store  Product  Sales
1     B  Oranges     15
3     B   Apples     20
5     B  Bananas     25


Aggregation

After grouping the data, you might want to perform some calculations that summarize the groups. Common aggregations include sum, mean, median, min, and max.

Continuing with the grouped DataFrame above, you can calculate the total sales for each store:

In [110]:
total_sales = grouped['Sales'].sum()
print(total_sales)

Store
A    33
B    60
Name: Sales, dtype: int64


Multiple Grouping

You can also group by multiple columns. For example, grouping by both 'Store' and 'Product' gives a breakdown at a more detailed level:

In [114]:
grouped_complex = df2.groupby(['Store', 'Product'])
total_sales_complex = grouped_complex['Sales'].sum()
print(total_sales_complex)

Store  Product
A      Apples     10
       Bananas    23
B      Apples     20
       Bananas    25
       Oranges    15
Name: Sales, dtype: int64


## Join

In [115]:
# Create the first dataframe
df1 = pd.DataFrame({
    'EmployeeID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
})

# Create the second dataframe
df2 = pd.DataFrame({
    'EmployeeID': [3, 4, 5, 6],
    'Department': ['HR', 'IT', 'Finance', 'Marketing']
})