## **TEHREEM ZUBAIR**
## **TASK 12**

---
## **What is pandas**
---
- Pandas is a powerful and popular data manipulation and analysis library for Python. 
- It is built on top of NumPy and provides data structures and functions needed to work with structured data seamlessly. 
- Pandas is particularly useful for data cleaning, transformation, visualization, and exploration.

---
## **Why we need numpy?**
---
- Pandas functions are designed to be intuitive and easy to use, making complex data operations more accessible.
- Pandas integrates well with other data science libraries like Matplotlib, Seaborn, and Scikit-Learn, enhancing its capabilities for data analysis and visualization.
- Helps get data ready for machine learning.Pandas simplifies the process of handling missing data, merging datasets, reshaping data, and more.


In [2]:
import pandas as pd
import numpy as np

---
## **PANDAS SERIES**

A Pandas Series is a one-dimensional array-like object that can hold data of any type. It is similar to a column in a spreadsheet or a database table.

---

### **CREATING A PANDAS SERIES**
- A Pandas Series is like a column in a table. It is a one-dimensional array holding data of any type.
- We can create a Pandas Series from a Python list, a NumPy array, or a dictionary.

In [2]:
# From a python list
data_list = [10, 20, 30, 40, 50]
series = pd.Series(data_list)
print("Series from list:\n", series)

Series from list:
 0    10
1    20
2    30
3    40
4    50
dtype: int64


In [3]:
# From  a numpy array
np_array = np.array([10, 20, 30, 40, 50])
series = pd.Series(np_array)
print("Series from numpy array:\n", series)

Series from numpy array:
 0    10
1    20
2    30
3    40
4    50
dtype: int64


In [4]:
# From a dictionary
data_dict = {'a': 10, 'b':20, 'c':30, 'd':40, 'e':50}
series = pd.Series(data_dict)
print("Series from dictionary: \n", series)

Series from dictionary: 
 a    10
b    20
c    30
d    40
e    50
dtype: int64


### **ASSIGNING A CUSTOM SERIES**

In [6]:
data_list = [10, 20, 30, 40, 50]
series = pd.Series(data_list, index=['a', 'b', 'c', 'd', 'e'])
print("\nSeries with custom index:\n", series)


Series with custom index:
 a    10
b    20
c    30
d    40
e    50
dtype: int64


### **BASIC ARITHEMATIC OPERATIONS**
- You can directly perform arithmetic operations like addition (+), subtraction (-), multiplication (*), and division (/) on Pandas Series.
- They can accept uneven series sizes.

In [8]:
# Creating two Series
s1 = pd.Series([10, 20, 30, 40])
s2 = pd.Series([1, 2, 3, 4, 5])

# Addition
s3 = s1 + s2
print("Addition:")
print(s3)

# Subtraction
s4 = s1 - s2
print("\nSubtraction:")
print(s4)

# Multiplication
s5 = s1 * s2
print("\nMultiplication:")
print(s5)

# Division
s6 = s1 / s2
print("\nDivision:")
print(s6)

Addition:
0    11.0
1    22.0
2    33.0
3    44.0
4     NaN
dtype: float64

Subtraction:
0     9.0
1    18.0
2    27.0
3    36.0
4     NaN
dtype: float64

Multiplication:
0     10.0
1     40.0
2     90.0
3    160.0
4      NaN
dtype: float64

Division:
0    10.0
1    10.0
2    10.0
3    10.0
4     NaN
dtype: float64


### **BROADCASTING**
Operations between a Series and a scalar value will apply the operation element-wise across the Series.

In [9]:
# Broadcasting with scalar
s7 = s1 * 2
print("\nBroadcasting with scalar:")
print(s7)


Broadcasting with scalar:
0    20
1    40
2    60
3    80
dtype: int64


In [10]:
# series and series
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([10, 20, 30, 40], index=['b', 'c', 'd', 'e'])

result = s1 + s2
print(result)


a     NaN
b    12.0
c    23.0
d    34.0
e     NaN
dtype: float64


### **HANDLING MISSING VALUES**
Operations involving Series with NaN (Not a Number) values will result in NaN unless specified otherwise using appropriate methods like fillna() or dropna().

In [12]:
s = pd.Series([1, 2, None, 4, None])
# Check for NaN values
print(s.isnull())


0    False
1    False
2     True
3    False
4     True
dtype: bool


In [13]:
# DROPPING MISSING VALUES
s_without_nan = s.dropna()
print(s_without_nan)

0    1.0
1    2.0
3    4.0
dtype: float64


In [15]:
s_filled = s.fillna(0)  # Fill NaN with 0
print(s_filled)

0    1.0
1    2.0
2    0.0
3    4.0
4    0.0
dtype: float64


In [16]:
# Handling NaN
s8 = pd.Series([10, 20, None, 40])
s9 = pd.Series([1, None, 3, 4])
s10 = s8 + s9
print("\nHandling NaN:")
print(s10)


Handling NaN:
0    11.0
1     NaN
2     NaN
3    44.0
dtype: float64


### **ACCESSING ELEMENTS**
We can access elements in a Series using index labels or positions.

In [17]:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

# Using index labels
print("\nAccessing element with label 'a':", s1['a'])

# Using positions
print("Accessing element at position 0:", s1[0])



Accessing element with label 'a': 1
Accessing element at position 0: 1


  print("Accessing element at position 0:", s1[0])


### **FILTERING VALUES**
Filtering values in Pandas Series involves selecting elements based on certain conditions. Here are some common ways to filter values in Pandas Series:

#### **BOOLEAN INDEXING**

In [18]:
s = pd.Series([10, 20, 30, 40, 50])

# Filter values greater than 30
s[s > 30]

3    40
4    50
dtype: int64

#### **USING METHODS**
Utilize methods like isin() to filter based on membership in a list or between() to filter within a range.

In [19]:
s = pd.Series(['apple', 'banana', 'cherry', 'date'])

# Filter values within a list
filtered = s[s.isin(['banana', 'date'])]
print(filtered)


1    banana
3      date
dtype: object


In [21]:
s = pd.Series(['apple', 'banana', 'cherry', 'date', 'mango', 'melon'])

# Filter values within a list
filtered = s[s.between('banana', 'mango')]
print(filtered)


1    banana
2    cherry
3      date
4     mango
dtype: object


#### **USING FUNCTIONS**
Apply custom functions with apply() to filter elements based on more complex conditions.

In [22]:
s = pd.Series([1, 2, 3, 4, 5])

# Filter values based on a function
def is_even(x):
    return x % 2 == 0

filtered = s[s.apply(is_even)]
print(filtered)

1    2
3    4
dtype: int64


#### **COMBINING CONDITIONS**
Use logical operators (&, |, ~) to combine multiple conditions.

In [23]:
s = pd.Series([10, 20, 30, 40, 50])
s[(s > 20) & (s < 50)]

2    30
3    40
dtype: int64

---
## **DATAFRAMES**
---
- A dataframe is a data structure constructed with rows and columns, similar to a database or Excel spreadsheet.
- A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
- It consists of a dictionary of lists in which the list each have their own identifiers or keys, such as “last name” or “food group.”

---
### **CREATING DATAFRAMES**
Creating a DataFrame in Pandas can be done in several ways, such as from a dictionary, from a list of lists, from NumPy arrays, or from another DataFrame. Here are some common methods to create a DataFrame:

#### **1. FROM DICTIONARY**
We can create a DataFrame from a dictionary where keys are column names and values are lists or arrays.

In [4]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston


#### **2. FROM LIST OF LISTS**
Create a DataFrame from a list of lists where each sublist represents a row.

In [6]:
data = [['Ali', 25, 'Lahore'],
        ['Ahmed', 26, 'Islamabad'],
        ['Zainab', 67, 'Peshawar']]

df = pd.DataFrame(data, columns = ['Name', 'Age', 'City'])
df

Unnamed: 0,Name,Age,City
0,Ali,25,Lahore
1,Ahmed,26,Islamabad
2,Zainab,67,Peshawar


#### **3. FROM NUMPY ARRAYS**

In [9]:
data = np.array([['Ali', 25, 'Lahore'],
                 ['Ahmed', 26, 'Islamabad'],
                 ['Zainab', 67, 'Peshawar']])
df = pd.DataFrame(data, columns = ['Name', 'Age', 'City'])
df

Unnamed: 0,Name,Age,City
0,Ali,25,Lahore
1,Ahmed,26,Islamabad
2,Zainab,67,Peshawar


#### **4. FROM ANOTHER DATAFRAME**
We can create a new DataFrame by copying an existing DataFrame.

In [10]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df1 = pd.DataFrame(data)
df2 = pd.DataFrame(df1)
df2

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston


---
### **LOADING DATA FROM .csv FILES**
- CSV stands for Comma-Separated Values. 
- It is a plain text file format used to store tabular data, where each line represents a row, and each value within a line is separated by a comma.
- Loading data from a CSV file into a Pandas DataFrame is a common operation. 
- We can do it using the pd.read_csv().
 
For this part of the task I have taken a csv file that is a flats' dataset from the kaggle.

In [7]:
df = pd.read_csv('/kaggle/input/flats-uncleaned-dataset/surat_uncleaned.csv')
df.head()   # will print first five rows of dataset

Unnamed: 0,property_name,areaWithType,square_feet,transaction,status,floor,furnishing,facing,description,price_per_sqft,price
0,2 BHK Apartment for Sale in Dindoli Surat,Carpet Area,644 sqft,New Property,Poss. by Oct '24,5 out of 10,Unfurnished,West,"Luxury project with basement parking, Solar ro...","₹2,891 per sqft",₹33.8 Lac
1,2 BHK Apartment for Sale in Althan Surat,Super Area,1278 sqft,New Property,Poss. by Jan '26,6 out of 14,Unfurnished,South -West,2 And 3 BHK Luxurious Flat for Sell In New Alt...,"₹3,551 per sqft",₹45.4 Lac
2,2 BHK Apartment for Sale in Pal Gam Surat,Super Area,1173 sqft,Resale,Ready to Move,5 out of 13,Semi-Furnished,East,This affordable 2 BHK flat is situated along a...,"₹3,800 per sqft",₹44.6 Lac
3,2 BHK Apartment for Sale in Jahangirabad Surat,Carpet Area,700 sqft,New Property,Ready to Move,6 out of 14,Unfurnished,East,2 BHK Flat For sell IN Jahangirabad Prime Loca...,"₹3,966 per sqft",₹47 Lac
4,"2 BHK Apartment for Sale in Orchid Fantasia, P...",Super Area,1250 sqft,Orchid Fantasia,New Property,Unfurnished,2,2,"Multistorey Apartment for Sale in Palanpur, Su...","₹3,600 per sqft",₹45 Lac


In [8]:
df.tail()  # last five rows

Unnamed: 0,property_name,areaWithType,square_feet,transaction,status,floor,furnishing,facing,description,price_per_sqft,price
4520,6 BHK Apartment for Sale in Millionaires Lifes...,Carpet Area,2000 sqft,New Property,Poss. by Dec '26,5 out of 12,Unfurnished,South - East,"Check out Millionaires Lifestyle in Vesu, one ...",,Call for Price
4521,"4 BHK Apartment for Sale in Savan Superia, Alt...",Super Area,3600 sqft,New Property,Poss. by Dec '25,5 out of 16,Unfurnished,South - East,Superia is a premium residential project launc...,,Call for Price
4522,5 BHK Apartment for Sale in Roongta Green Vall...,Carpet Area,2250 sqft,New Property,Poss. by Dec '25,7 out of 13,Unfurnished,North - East,"When it comes to beautiful homes, nothing beat...",,Call for Price
4523,"6 BHK Apartment for Sale in Cellestial Dreams,...",Carpet Area,3450 sqft,New Property,Ready to Move,7 out of 18,Unfurnished,North - West,"DRB Ravani Cellestial Dreams in Vesu, Surat is...",,Call for Price
4524,4 BHK Apartment for Sale in Roongta Green Vall...,Super Area,4500 sqft,New Property,Ready to Move,3 out of 12,Unfurnished,North,Roongta Green Valley is one of the popular res...,,Call for Price


---
#### **DATAFRAME SUMMARY**
You can get a summary of a DataFrame, including the mean, median, and standard deviation of numeric columns.

In [9]:
# Summary statistics
df.describe()

Unnamed: 0,property_name,areaWithType,square_feet,transaction,status,floor,furnishing,facing,description,price_per_sqft,price
count,4525,4525,4525,4421,4524,4480,4185,3936,3154,4157,4525
unique,1992,6,1399,38,138,222,78,176,2588,2134,841
top,3 BHK Apartment for Sale in Vesu Surat,Super Area,1000 sqft,Resale,Ready to Move,Resale,Unfurnished,East,Multistorey apartment is available for sale. I...,"₹6,000 per sqft",Call for Price
freq,93,2599,77,2197,3078,431,2322,1487,35,64,173


---
#### **ECTRACTIMG A COLUMN**

In [11]:
df['status']

0       Poss. by Oct '24
1       Poss. by Jan '26
2          Ready to Move
3          Ready to Move
4           New Property
              ...       
4520    Poss. by Dec '26
4521    Poss. by Dec '25
4522    Poss. by Dec '25
4523       Ready to Move
4524       Ready to Move
Name: status, Length: 4525, dtype: object

---
#### **ADDING/RENAMING/DELETING A COLUMN**

In [14]:
# adding a new column
df['NEW'] = True
df

Unnamed: 0,property_name,areaWithType,square_feet,transaction,status,floor,furnishing,facing,description,price_per_sqft,price,NEW
0,2 BHK Apartment for Sale in Dindoli Surat,Carpet Area,644 sqft,New Property,Poss. by Oct '24,5 out of 10,Unfurnished,West,"Luxury project with basement parking, Solar ro...","₹2,891 per sqft",₹33.8 Lac,True
1,2 BHK Apartment for Sale in Althan Surat,Super Area,1278 sqft,New Property,Poss. by Jan '26,6 out of 14,Unfurnished,South -West,2 And 3 BHK Luxurious Flat for Sell In New Alt...,"₹3,551 per sqft",₹45.4 Lac,True
2,2 BHK Apartment for Sale in Pal Gam Surat,Super Area,1173 sqft,Resale,Ready to Move,5 out of 13,Semi-Furnished,East,This affordable 2 BHK flat is situated along a...,"₹3,800 per sqft",₹44.6 Lac,True
3,2 BHK Apartment for Sale in Jahangirabad Surat,Carpet Area,700 sqft,New Property,Ready to Move,6 out of 14,Unfurnished,East,2 BHK Flat For sell IN Jahangirabad Prime Loca...,"₹3,966 per sqft",₹47 Lac,True
4,"2 BHK Apartment for Sale in Orchid Fantasia, P...",Super Area,1250 sqft,Orchid Fantasia,New Property,Unfurnished,2,2,"Multistorey Apartment for Sale in Palanpur, Su...","₹3,600 per sqft",₹45 Lac,True
...,...,...,...,...,...,...,...,...,...,...,...,...
4520,6 BHK Apartment for Sale in Millionaires Lifes...,Carpet Area,2000 sqft,New Property,Poss. by Dec '26,5 out of 12,Unfurnished,South - East,"Check out Millionaires Lifestyle in Vesu, one ...",,Call for Price,True
4521,"4 BHK Apartment for Sale in Savan Superia, Alt...",Super Area,3600 sqft,New Property,Poss. by Dec '25,5 out of 16,Unfurnished,South - East,Superia is a premium residential project launc...,,Call for Price,True
4522,5 BHK Apartment for Sale in Roongta Green Vall...,Carpet Area,2250 sqft,New Property,Poss. by Dec '25,7 out of 13,Unfurnished,North - East,"When it comes to beautiful homes, nothing beat...",,Call for Price,True
4523,"6 BHK Apartment for Sale in Cellestial Dreams,...",Carpet Area,3450 sqft,New Property,Ready to Move,7 out of 18,Unfurnished,North - West,"DRB Ravani Cellestial Dreams in Vesu, Surat is...",,Call for Price,True


Let's add a new column that will extract the number value from the string of the price_per_sqft column.

- In the code below I have useda regular expression to eatract the numerical value from expression.
- ('(\d+)') -> This expression will extract one or more occurences of the digits in the expression.
- I have also converted the extracted numerical value that are initially of type string to integer.
- I have also filled the missing values with 0

In [16]:
# adding a new column
df['square_feet_num'] = df['square_feet'].str.extract('(\d+)').fillna(0).astype(int)
df.head()

Unnamed: 0,property_name,areaWithType,square_feet,transaction,status,floor,furnishing,facing,description,price_per_sqft,price,NEW,square_feet_num
0,2 BHK Apartment for Sale in Dindoli Surat,Carpet Area,644 sqft,New Property,Poss. by Oct '24,5 out of 10,Unfurnished,West,"Luxury project with basement parking, Solar ro...","₹2,891 per sqft",₹33.8 Lac,True,644
1,2 BHK Apartment for Sale in Althan Surat,Super Area,1278 sqft,New Property,Poss. by Jan '26,6 out of 14,Unfurnished,South -West,2 And 3 BHK Luxurious Flat for Sell In New Alt...,"₹3,551 per sqft",₹45.4 Lac,True,1278
2,2 BHK Apartment for Sale in Pal Gam Surat,Super Area,1173 sqft,Resale,Ready to Move,5 out of 13,Semi-Furnished,East,This affordable 2 BHK flat is situated along a...,"₹3,800 per sqft",₹44.6 Lac,True,1173
3,2 BHK Apartment for Sale in Jahangirabad Surat,Carpet Area,700 sqft,New Property,Ready to Move,6 out of 14,Unfurnished,East,2 BHK Flat For sell IN Jahangirabad Prime Loca...,"₹3,966 per sqft",₹47 Lac,True,700
4,"2 BHK Apartment for Sale in Orchid Fantasia, P...",Super Area,1250 sqft,Orchid Fantasia,New Property,Unfurnished,2,2,"Multistorey Apartment for Sale in Palanpur, Su...","₹3,600 per sqft",₹45 Lac,True,1250


Now that we have extracted the numerical value of the area column, Let's delete the square_feet column because now it is of no use.

- inplace = True means that the changes wil be done in the same dataframe and no new dataframe will be formed. 

In [17]:
# deleting a column
df.drop(columns=['square_feet'], inplace = True)
df

Unnamed: 0,property_name,areaWithType,transaction,status,floor,furnishing,facing,description,price_per_sqft,price,NEW,square_feet_num
0,2 BHK Apartment for Sale in Dindoli Surat,Carpet Area,New Property,Poss. by Oct '24,5 out of 10,Unfurnished,West,"Luxury project with basement parking, Solar ro...","₹2,891 per sqft",₹33.8 Lac,True,644
1,2 BHK Apartment for Sale in Althan Surat,Super Area,New Property,Poss. by Jan '26,6 out of 14,Unfurnished,South -West,2 And 3 BHK Luxurious Flat for Sell In New Alt...,"₹3,551 per sqft",₹45.4 Lac,True,1278
2,2 BHK Apartment for Sale in Pal Gam Surat,Super Area,Resale,Ready to Move,5 out of 13,Semi-Furnished,East,This affordable 2 BHK flat is situated along a...,"₹3,800 per sqft",₹44.6 Lac,True,1173
3,2 BHK Apartment for Sale in Jahangirabad Surat,Carpet Area,New Property,Ready to Move,6 out of 14,Unfurnished,East,2 BHK Flat For sell IN Jahangirabad Prime Loca...,"₹3,966 per sqft",₹47 Lac,True,700
4,"2 BHK Apartment for Sale in Orchid Fantasia, P...",Super Area,Orchid Fantasia,New Property,Unfurnished,2,2,"Multistorey Apartment for Sale in Palanpur, Su...","₹3,600 per sqft",₹45 Lac,True,1250
...,...,...,...,...,...,...,...,...,...,...,...,...
4520,6 BHK Apartment for Sale in Millionaires Lifes...,Carpet Area,New Property,Poss. by Dec '26,5 out of 12,Unfurnished,South - East,"Check out Millionaires Lifestyle in Vesu, one ...",,Call for Price,True,2000
4521,"4 BHK Apartment for Sale in Savan Superia, Alt...",Super Area,New Property,Poss. by Dec '25,5 out of 16,Unfurnished,South - East,Superia is a premium residential project launc...,,Call for Price,True,3600
4522,5 BHK Apartment for Sale in Roongta Green Vall...,Carpet Area,New Property,Poss. by Dec '25,7 out of 13,Unfurnished,North - East,"When it comes to beautiful homes, nothing beat...",,Call for Price,True,2250
4523,"6 BHK Apartment for Sale in Cellestial Dreams,...",Carpet Area,New Property,Ready to Move,7 out of 18,Unfurnished,North - West,"DRB Ravani Cellestial Dreams in Vesu, Surat is...",,Call for Price,True,3450


Now rename the column square feet to area in data frame

In [22]:
df.rename(columns={'square_feet_num': 'Area'}, inplace = True)
df.head()

Unnamed: 0,property_name,areaWithType,transaction,status,floor,furnishing,facing,description,price_per_sqft,price,Area
0,2 BHK Apartment for Sale in Dindoli Surat,Carpet Area,New Property,Poss. by Oct '24,5 out of 10,Unfurnished,West,"Luxury project with basement parking, Solar ro...","₹2,891 per sqft",₹33.8 Lac,644
1,2 BHK Apartment for Sale in Althan Surat,Super Area,New Property,Poss. by Jan '26,6 out of 14,Unfurnished,South -West,2 And 3 BHK Luxurious Flat for Sell In New Alt...,"₹3,551 per sqft",₹45.4 Lac,1278
2,2 BHK Apartment for Sale in Pal Gam Surat,Super Area,Resale,Ready to Move,5 out of 13,Semi-Furnished,East,This affordable 2 BHK flat is situated along a...,"₹3,800 per sqft",₹44.6 Lac,1173
3,2 BHK Apartment for Sale in Jahangirabad Surat,Carpet Area,New Property,Ready to Move,6 out of 14,Unfurnished,East,2 BHK Flat For sell IN Jahangirabad Prime Loca...,"₹3,966 per sqft",₹47 Lac,700
4,"2 BHK Apartment for Sale in Orchid Fantasia, P...",Super Area,Orchid Fantasia,New Property,Unfurnished,2,2,"Multistorey Apartment for Sale in Palanpur, Su...","₹3,600 per sqft",₹45 Lac,1250


---
**Now that we have studies the basics of pandas series and dataframes. NOw let's try to solve the tasks being given in the assignment.**

**1. Create a Pandas Series from a Python list, numpy array, and a dictionary.**

In [24]:
list = [10, 20, 30, 40, 50]
series = pd.Series(list)
series

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [27]:
np_array = np.array([1, 2, 3, 4, 5])
series = pd.Series(np_array)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [28]:
dict = { 'A' : 1,
         'B' : 2,
         'C' : 3,
         'D' : 4,
         'E' : 5  }
series = pd.Series(dict)
series

A    1
B    2
C    3
D    4
E    5
dtype: int64

**2. Assign a custom index to the Series.**

In [31]:
np_array = np.array([11, 22, 33, 44, 55])
series = pd.Series(np_array, index = ['A', 'B', 'C', 'D', 'E'])
series

A    11
B    22
C    33
D    44
E    55
dtype: int64

**3. Perform basic arithmetic operations on Series.**

In [38]:
list1 = [0, 1, 2, 3, 4, 5, 6, 7]
list2 = [10, 20, 30, 40, 50, 60, 70, 80]

series1 = pd.Series(list1)
series2 = pd.Series(list2)

print("ADDITION: \n",series1 + series2)
print("SUBTRACTION: \n",series1 - series2)
print("MULTIPLICATION: \n",series1 * series2)
print("DIVISION: \n",series1 / series2)


ADDITION: 
 0    10
1    21
2    32
3    43
4    54
5    65
6    76
7    87
dtype: int64
SUBTRACTION: 
 0   -10
1   -19
2   -28
3   -37
4   -46
5   -55
6   -64
7   -73
dtype: int64
MULTIPLICATION: 
 0      0
1     20
2     60
3    120
4    200
5    300
6    420
7    560
dtype: int64
DIVISION: 
 0    0.000000
1    0.050000
2    0.066667
3    0.075000
4    0.080000
5    0.083333
6    0.085714
7    0.087500
dtype: float64


In [39]:
series1 * 2

0     0
1     2
2     4
3     6
4     8
5    10
6    12
7    14
dtype: int64

In [40]:
series2 / 3

0     3.333333
1     6.666667
2    10.000000
3    13.333333
4    16.666667
5    20.000000
6    23.333333
7    26.666667
dtype: float64

**4. Access elements using index labels and positions.**

In [43]:
list2 = [10, 20, 30, 40, 50, 60, 70, 80]

series1 = pd.Series(list1, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])


# using index lables
series1[['a', 'e']]

a    0
e    4
dtype: int64

In [44]:
# using positions
series1.iloc[[1, 3, 5]]

b    1
d    3
f    5
dtype: int64

**5. Filter the Series to include only values greater than a specific threshold.**

In [47]:
list = [10, 20, 30, 40, 50, 60, 70, 80]
series = pd.Series(list)
series[(series > 30) & (series < 70)]

3    40
4    50
5    60
dtype: int64

**6. Create a DataFrame from a dictionary of lists.**

In [50]:
dict = {
    'Name' : ['Ali', 'Ahmed', 'Sara', 'Laiba', 'Laraib', 'Zara'],
    'Age' : [10, 20, 30, 40, 50, 60],
    'City' : ['Islamabad', 'Lahore', 'Quetta', 'Karachi', 'Swaat', 'Pindi'],
    'Gender' : ['Male', 'Male', 'Female', 'Female', 'Female', 'Female']
}

df =pd.DataFrame(dict)
df

Unnamed: 0,Name,Age,City,Gender
0,Ali,10,Islamabad,Male
1,Ahmed,20,Lahore,Male
2,Sara,30,Quetta,Female
3,Laiba,40,Karachi,Female
4,Laraib,50,Swaat,Female
5,Zara,60,Pindi,Female


**7. Create a DataFrame from a numpy array, specifying column and index names.**

In [52]:
data = np.array([['Ali', 25, 'Lahore'],
                 ['Ahmed', 26, 'Islamabad'],
                 ['Zainab', 67, 'Peshawar'],
                 ['Zain', 43, 'Swaat'],
                 ['Zara', 32, 'Karachi']])
df = pd.DataFrame(data, columns = ['Name', 'Age', 'City'], index = ['Student1', 'Student2', 'Student3', 'Student4', 'Stident5'])
df

Unnamed: 0,Name,Age,City
Student1,Ali,25,Lahore
Student2,Ahmed,26,Islamabad
Student3,Zainab,67,Peshawar
Student4,Zain,43,Swaat
Stident5,Zara,32,Karachi


**8. Load a DataFrame from a CSV file.**

In [57]:
df = pd.read_csv('/kaggle/input/cars-dataset/cars_for_sale(uncleaned).csv')

**9. Display the first and last five rows of the DataFrame.**

In [81]:
df.head()

Unnamed: 0,Car,Condition,Mileage,Price,Basics Info,Vehicle History Info,Vehicle Reviews Info,Seller Rating,Seller Rating Count,Seller Address
0,2024 Lexus LC 500 Base,New,0 mi.,"$112,865MSRP $118,865","{'Exterior color': ' Caviar ', 'Interior color...",{},{},4.7,"(1,261 reviews)","1250 W Division St Chicago, IL 60642"
1,2007 Acura TSX Base,Used,"61,110 mi.","$11,295",{'Exterior color': ' Alabaster Silver Metallic...,{'Accidents or damage': 'At least 1 accident o...,{},4.2,(440 reviews),"1301 N Elston Ave Chicago, IL 60642"
2,2016 McLaren 675LT Base,Used,"6,305 mi.","$219,997$5,464 price drop","{'Exterior color': ' McLaren Orange ', 'Interi...",{'Accidents or damage': 'At least 1 accident o...,"{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",3.1,(421 reviews),"1561 N Fremont St Chicago, IL 60642"
3,2016 Audi TTS 2.0T quattro,Used,"65,715 mi.","$23,999","{'Exterior color': ' Black ', 'Interior color'...",{'Accidents or damage': 'At least 1 accident o...,"{'Comfort': '3.0', 'Interior': '5.0', 'Perform...",3.6,(123 reviews),"560 E North Ave West Chicago, IL 60185"
4,2018 BMW 740e xDrive iPerformance,Used,"19,830 mi.","$39,799$100 price drop","{'Exterior color': ' Imperial Blue Metallic ',...","{'Accidents or damage': 'None reported', 'Clea...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.4,(91 reviews),"6539 Ogden Ave Berwyn, IL 60402"


In [59]:
df.tail()

Unnamed: 0,Car,Condition,Mileage,Price,Basics Info,Vehicle History Info,Vehicle Reviews Info,Seller Rating,Seller Rating Count,Seller Address
9241,,,,,{},{},{},,,
9242,2022 BMW X3 xDrive30i,Used,"48,804 mi.","$27,979$998 price drop","{'Exterior color': ' Dark Graphite Metallic ',...","{'Accidents or damage': 'None reported', '1-ow...",{},4.8,"(3,739 reviews)","1313 Rand Road Des Plaines, IL 60016"
9243,2024 GMC Sierra 1500 Pro,New,3 mi.,"$51,080MSRP $51,080","{'Exterior color': ' Summit White ', 'Interior...",{},"{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.5,"(3,896 reviews)","8425 159th St Tinley Park, IL 60487"
9244,2012 GMC Terrain SLT-1,Used,"146,694 mi.","$8,995","{'Exterior color': ' Black ', 'Interior color'...",{'Accidents or damage': 'At least 1 accident o...,{},,,"13840 South Pulaski Road Crestwood, IL 60445"
9245,2024 Subaru Outback Touring XT,New,4 mi.,"$42,014",{'Exterior color': ' Magnetite Gray Metallic '...,{},"{'Comfort': '4.0', 'Interior': '4.0', 'Perform...",,,"1350 Park Ave W Highland Park, IL 60035"


**10. Get a summary of the DataFrame including the mean, median, and standard deviation of numeric columns.**

In [60]:
df.describe()

Unnamed: 0,Seller Rating
count,7716.0
mean,4.383307
std,0.609811
min,1.3
25%,4.2
50%,4.6
75%,4.8
max,5.0


**11. Extract a specific column as a Series.**

In [74]:
Mileage = df['Mileage']
Mileage

0              0 mi.
1         61,110 mi.
2          6,305 mi.
3         65,715 mi.
4         19,830 mi.
            ...     
9241             NaN
9242      48,804 mi.
9243           3 mi.
9244     146,694 mi.
9245           4 mi.
Name: Mileage, Length: 9246, dtype: object

**12. Filter rows based on column values.**

In [84]:
filtered_df = df[(df['Condition'] == 'Used')]
filtered_df.head()

Unnamed: 0,Car,Condition,Mileage,Price,Basics Info,Vehicle History Info,Vehicle Reviews Info,Seller Rating,Seller Rating Count,Seller Address
1,2007 Acura TSX Base,Used,"61,110 mi.","$11,295",{'Exterior color': ' Alabaster Silver Metallic...,{'Accidents or damage': 'At least 1 accident o...,{},4.2,(440 reviews),"1301 N Elston Ave Chicago, IL 60642"
2,2016 McLaren 675LT Base,Used,"6,305 mi.","$219,997$5,464 price drop","{'Exterior color': ' McLaren Orange ', 'Interi...",{'Accidents or damage': 'At least 1 accident o...,"{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",3.1,(421 reviews),"1561 N Fremont St Chicago, IL 60642"
3,2016 Audi TTS 2.0T quattro,Used,"65,715 mi.","$23,999","{'Exterior color': ' Black ', 'Interior color'...",{'Accidents or damage': 'At least 1 accident o...,"{'Comfort': '3.0', 'Interior': '5.0', 'Perform...",3.6,(123 reviews),"560 E North Ave West Chicago, IL 60185"
4,2018 BMW 740e xDrive iPerformance,Used,"19,830 mi.","$39,799$100 price drop","{'Exterior color': ' Imperial Blue Metallic ',...","{'Accidents or damage': 'None reported', 'Clea...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.4,(91 reviews),"6539 Ogden Ave Berwyn, IL 60402"
5,2019 Jeep Cherokee Limited,Used,"52,245 mi.","$22,966$173 price drop","{'Exterior color': ' Diamond Black ', 'Interio...",{'Accidents or damage': 'At least 1 accident o...,{},4.2,"(3,030 reviews)","6750 W Grand Ave Chicago, IL 60707"


In [83]:
filtered_df = df[(df['Seller Rating'] > 4.0)]
filtered_df.head()

Unnamed: 0,Car,Condition,Mileage,Price,Basics Info,Vehicle History Info,Vehicle Reviews Info,Seller Rating,Seller Rating Count,Seller Address
0,2024 Lexus LC 500 Base,New,0 mi.,"$112,865MSRP $118,865","{'Exterior color': ' Caviar ', 'Interior color...",{},{},4.7,"(1,261 reviews)","1250 W Division St Chicago, IL 60642"
1,2007 Acura TSX Base,Used,"61,110 mi.","$11,295",{'Exterior color': ' Alabaster Silver Metallic...,{'Accidents or damage': 'At least 1 accident o...,{},4.2,(440 reviews),"1301 N Elston Ave Chicago, IL 60642"
4,2018 BMW 740e xDrive iPerformance,Used,"19,830 mi.","$39,799$100 price drop","{'Exterior color': ' Imperial Blue Metallic ',...","{'Accidents or damage': 'None reported', 'Clea...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.4,(91 reviews),"6539 Ogden Ave Berwyn, IL 60402"
5,2019 Jeep Cherokee Limited,Used,"52,245 mi.","$22,966$173 price drop","{'Exterior color': ' Diamond Black ', 'Interio...",{'Accidents or damage': 'At least 1 accident o...,{},4.2,"(3,030 reviews)","6750 W Grand Ave Chicago, IL 60707"
6,2019 Cadillac CT6 3.6L Luxury,Used,"94,008 mi.","$26,995$183 price drop","{'Exterior color': ' Black ', 'Interior color'...",{'Accidents or damage': 'At least 1 accident o...,{},4.1,(50 reviews),"7158 Harlem Ave Bridgeview, IL 60455"


**13. Select rows based on multiple conditions.**

In [109]:
df['Seller Rating Count'].dtype

dtype('int64')

In [111]:
filtered_df = df[(df['Condition'] == 'Used') & (df['Price'] < '$50,000')]
filtered_df

Unnamed: 0,Car,Condition,Mileage,Price,Basics Info,Vehicle History Info,Vehicle Reviews Info,Seller Rating,Seller Rating Count,Seller Address
1,2007 Acura TSX Base,Used,61,"$11,295",{'Exterior color': ' Alabaster Silver Metallic...,{'Accidents or damage': 'At least 1 accident o...,{},4.2,440,"1301 N Elston Ave Chicago, IL 60642"
2,2016 McLaren 675LT Base,Used,6,"$219,997$5,464 price drop","{'Exterior color': ' McLaren Orange ', 'Interi...",{'Accidents or damage': 'At least 1 accident o...,"{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",3.1,421,"1561 N Fremont St Chicago, IL 60642"
3,2016 Audi TTS 2.0T quattro,Used,65,"$23,999","{'Exterior color': ' Black ', 'Interior color'...",{'Accidents or damage': 'At least 1 accident o...,"{'Comfort': '3.0', 'Interior': '5.0', 'Perform...",3.6,123,"560 E North Ave West Chicago, IL 60185"
4,2018 BMW 740e xDrive iPerformance,Used,19,"$39,799$100 price drop","{'Exterior color': ' Imperial Blue Metallic ',...","{'Accidents or damage': 'None reported', 'Clea...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.4,91,"6539 Ogden Ave Berwyn, IL 60402"
5,2019 Jeep Cherokee Limited,Used,52,"$22,966$173 price drop","{'Exterior color': ' Diamond Black ', 'Interio...",{'Accidents or damage': 'At least 1 accident o...,{},4.2,3,"6750 W Grand Ave Chicago, IL 60707"
...,...,...,...,...,...,...,...,...,...,...
9227,2015 Chevrolet Trax LTZ,Used,103,"$10,250","{'Exterior color': ' Ruby Red Metallic ', 'Int...",{'Accidents or damage': 'At least 1 accident o...,{},4.4,13,"17W434 East Roosevelt Rd Oakbrook Terrace, IL ..."
9230,2021 Audi SQ5 3.0T Prestige,Used,40,"$41,795$600 price drop","{'Exterior color': ' Quantum Gray ', 'Interior...",{'Accidents or damage': 'At least 1 accident o...,{},2.9,43,"1811 N Rand Rd Palatine, IL 60074"
9237,2012 Mercedes-Benz SLS AMG Base,Used,6,"$144,800$5,000 price drop","{'Exterior color': ' Zircon Red Metallic ', 'I...","{'Accidents or damage': 'None reported', '1-ow...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.7,715,"27w110 North Ave West Chicago, IL 60185"
9239,2016 Tesla Model X 90D,Used,79,"$30,952$636 price drop","{'Exterior color': ' Dark Blue ', 'Interior co...","{'Accidents or damage': 'None reported', 'Clea...","{'Comfort': '1.0', 'Interior': '1.0', 'Perform...",,0,"3325 W Montrose Ave Chicago, IL 60618"


In [113]:
filtered_df = df[(df['Seller Rating'] > 4.0) & (df['Mileage'] <= 20000)]
filtered_df

Unnamed: 0,Car,Condition,Mileage,Price,Basics Info,Vehicle History Info,Vehicle Reviews Info,Seller Rating,Seller Rating Count,Seller Address
0,2024 Lexus LC 500 Base,New,0,"$112,865MSRP $118,865","{'Exterior color': ' Caviar ', 'Interior color...",{},{},4.7,1,"1250 W Division St Chicago, IL 60642"
1,2007 Acura TSX Base,Used,61,"$11,295",{'Exterior color': ' Alabaster Silver Metallic...,{'Accidents or damage': 'At least 1 accident o...,{},4.2,440,"1301 N Elston Ave Chicago, IL 60642"
4,2018 BMW 740e xDrive iPerformance,Used,19,"$39,799$100 price drop","{'Exterior color': ' Imperial Blue Metallic ',...","{'Accidents or damage': 'None reported', 'Clea...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.4,91,"6539 Ogden Ave Berwyn, IL 60402"
5,2019 Jeep Cherokee Limited,Used,52,"$22,966$173 price drop","{'Exterior color': ' Diamond Black ', 'Interio...",{'Accidents or damage': 'At least 1 accident o...,{},4.2,3,"6750 W Grand Ave Chicago, IL 60707"
6,2019 Cadillac CT6 3.6L Luxury,Used,94,"$26,995$183 price drop","{'Exterior color': ' Black ', 'Interior color'...",{'Accidents or damage': 'At least 1 accident o...,{},4.1,50,"7158 Harlem Ave Bridgeview, IL 60455"
...,...,...,...,...,...,...,...,...,...,...
9235,2023 Jeep Gladiator Sport S,New,5,"$42,995MSRP $58,910",{'Exterior color': ' Granite Crystal Metallic ...,{},"{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.5,3,"1155 W Dundee Rd Arlington Heights, IL 60004"
9236,2024 Subaru Forester Premium,New,6,"$33,989","{'Exterior color': ' Autumn Green Metallic ', ...",{},"{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.5,221,"1911 N Rand Rd Palatine, IL 60074"
9237,2012 Mercedes-Benz SLS AMG Base,Used,6,"$144,800$5,000 price drop","{'Exterior color': ' Zircon Red Metallic ', 'I...","{'Accidents or damage': 'None reported', '1-ow...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.7,715,"27w110 North Ave West Chicago, IL 60185"
9242,2022 BMW X3 xDrive30i,Used,48,"$27,979$998 price drop","{'Exterior color': ' Dark Graphite Metallic ',...","{'Accidents or damage': 'None reported', '1-ow...",{},4.8,3,"1313 Rand Road Des Plaines, IL 60016"


**14. Add a new column to the DataFrame.**

Let's extract the exterior color into a separate column.

In [132]:
df['Exterior color'] = df['Basics Info'].str.extract(r"'Exterior color':\s*'([^']*)'")
df.head()

Unnamed: 0,Car,Condition,Mileage,Price,Basics Info,Vehicle Reviews Info,Seller Rating,Seller Rating Count,Seller Address,Exterior color
0,2024 Lexus LC 500 Base,New,0,112,"{'Exterior color': ' Caviar ', 'Interior color...",{},4.7,1,"1250 W Division St Chicago, IL 60642",Caviar
1,2007 Acura TSX Base,Used,61,11,{'Exterior color': ' Alabaster Silver Metallic...,{},4.2,440,"1301 N Elston Ave Chicago, IL 60642",Alabaster Silver Metallic
2,2016 McLaren 675LT Base,Used,6,219,"{'Exterior color': ' McLaren Orange ', 'Interi...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",3.1,421,"1561 N Fremont St Chicago, IL 60642",McLaren Orange
3,2016 Audi TTS 2.0T quattro,Used,65,23,"{'Exterior color': ' Black ', 'Interior color'...","{'Comfort': '3.0', 'Interior': '5.0', 'Perform...",3.6,123,"560 E North Ave West Chicago, IL 60185",Black
4,2018 BMW 740e xDrive iPerformance,Used,19,39,"{'Exterior color': ' Imperial Blue Metallic ',...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.4,91,"6539 Ogden Ave Berwyn, IL 60402",Imperial Blue Metallic


In [124]:
df.loc[0, 'Basics Info']

"{'Exterior color': ' Caviar ', 'Interior color': 'Black ', 'Drivetrain': 'Rear-wheel Drive ', 'MPG': ' 15–25Based on EPA mileage ratings. Use for comparison purposes only. Actual mileage will vary depending on driving conditions, driving habits, vehicle maintenance, and other factors.', 'Fuel type': 'Gasoline ', 'Transmission': '10-Speed Automatic', 'Engine': '5.0L V8 32V PDI DOHC', 'VIN': ' JTHNPAAY2RA108024 ', 'Stock #': ' R087 ', 'Mileage': ' 0 mi. '}"

**15. Delete a column from the DataFrame.**

In [None]:
df.drop(columns = ['Vehicle History Info'], inplace = True)

In [134]:
df.head()

Unnamed: 0,Car,Condition,Mileage,Price,Basics Info,Vehicle Reviews Info,Seller Rating,Seller Rating Count,Seller Address,Exterior color
0,2024 Lexus LC 500 Base,New,0,112,"{'Exterior color': ' Caviar ', 'Interior color...",{},4.7,1,"1250 W Division St Chicago, IL 60642",Caviar
1,2007 Acura TSX Base,Used,61,11,{'Exterior color': ' Alabaster Silver Metallic...,{},4.2,440,"1301 N Elston Ave Chicago, IL 60642",Alabaster Silver Metallic
2,2016 McLaren 675LT Base,Used,6,219,"{'Exterior color': ' McLaren Orange ', 'Interi...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",3.1,421,"1561 N Fremont St Chicago, IL 60642",McLaren Orange
3,2016 Audi TTS 2.0T quattro,Used,65,23,"{'Exterior color': ' Black ', 'Interior color'...","{'Comfort': '3.0', 'Interior': '5.0', 'Perform...",3.6,123,"560 E North Ave West Chicago, IL 60185",Black
4,2018 BMW 740e xDrive iPerformance,Used,19,39,"{'Exterior color': ' Imperial Blue Metallic ',...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.4,91,"6539 Ogden Ave Berwyn, IL 60402",Imperial Blue Metallic


**16. Rename columns in the DataFrame.**

In [136]:
df.rename(columns = {'Seller Address': 'Address'}, inplace = True)
df.head()

Unnamed: 0,Car,Condition,Mileage,Price,Basics Info,Vehicle Reviews Info,Seller Rating,Seller Rating Count,Address,Exterior color
0,2024 Lexus LC 500 Base,New,0,112,"{'Exterior color': ' Caviar ', 'Interior color...",{},4.7,1,"1250 W Division St Chicago, IL 60642",Caviar
1,2007 Acura TSX Base,Used,61,11,{'Exterior color': ' Alabaster Silver Metallic...,{},4.2,440,"1301 N Elston Ave Chicago, IL 60642",Alabaster Silver Metallic
2,2016 McLaren 675LT Base,Used,6,219,"{'Exterior color': ' McLaren Orange ', 'Interi...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",3.1,421,"1561 N Fremont St Chicago, IL 60642",McLaren Orange
3,2016 Audi TTS 2.0T quattro,Used,65,23,"{'Exterior color': ' Black ', 'Interior color'...","{'Comfort': '3.0', 'Interior': '5.0', 'Perform...",3.6,123,"560 E North Ave West Chicago, IL 60185",Black
4,2018 BMW 740e xDrive iPerformance,Used,19,39,"{'Exterior color': ' Imperial Blue Metallic ',...","{'Comfort': '5.0', 'Interior': '5.0', 'Perform...",4.4,91,"6539 Ogden Ave Berwyn, IL 60402",Imperial Blue Metallic


---
## **MORE PRACTICE**

To get a more better understanding of the topics that I have learned uptill now, I am using another dataset and applying all the techniques that i have learned uptill now.

---

In [5]:
# read the csv file
df = pd.read_csv('/kaggle/input/countries/country_db.csv')

In [6]:
# print first five rows
df.head()

Unnamed: 0,country,continent,year,population,number_of_languages,year_index
0,Afghanistan,Asia,2018,37172386,5,1
1,Albania,Europe,2018,2866376,3,1
2,Algeria,Africa,2018,42228429,2,1
3,American Samoa,Oceania,2017,55620,3,1
4,Andorra,Europe,2018,77006,4,1


In [9]:
# Checking for missing values
df.isnull().sum()

country                0
continent              0
year                   0
population             0
number_of_languages    0
year_index             0
dtype: int64

In [10]:
# print summary
df.describe()

Unnamed: 0,year,population,number_of_languages,year_index
count,203.0,203.0,203.0,203.0
mean,2017.600985,36975820.0,4.522167,1.0
std,2.052142,141076700.0,2.672421,0.0
min,2000.0,11508.0,1.0,1.0
25%,2018.0,1162728.0,2.0,1.0
50%,2018.0,6956071.0,4.0,1.0
75%,2018.0,25739300.0,6.0,1.0
max,2018.0,1392730000.0,12.0,1.0


Extract rows where continent is 'Europe' and year is 2018

In [12]:
filtered_df = df[(df['continent'] == 'Europe') & (df['year'] == 2018)]
filtered_df

Unnamed: 0,country,continent,year,population,number_of_languages,year_index
1,Albania,Europe,2018,2866376,3,1
4,Andorra,Europe,2018,77006,4,1
11,Austria,Europe,2018,8847037,8,1
17,Belarus,Europe,2018,9485386,4,1
18,Belgium,Europe,2018,11422068,6,1
24,Bosnia and Herzegovina,Europe,2018,3323929,1,1
28,Bulgaria,Europe,2018,7024216,4,1
45,Croatia,Europe,2018,4089400,2,1
48,Czech Republic,Europe,2018,10625695,8,1
49,Denmark,Europe,2018,5797446,7,1


Extract rows where number_of_languages is greater than 2 and year is 2019

In [13]:
filtered_df = df[(df['number_of_languages'] > 2) & (df['year'] == 2018)]
filtered_df

Unnamed: 0,country,continent,year,population,number_of_languages,year_index
0,Afghanistan,Asia,2018,37172386,5,1
1,Albania,Europe,2018,2866376,3,1
4,Andorra,Europe,2018,77006,4,1
5,Angola,Africa,2018,30809762,9,1
7,Argentina,South America,2018,44494502,3,1
...,...,...,...,...,...,...
195,Uzbekistan,Asia,2018,32955400,6,1
196,Vanuatu,Oceania,2018,292680,3,1
198,Vietnam,Asia,2018,95540395,9,1
201,Zambia,Africa,2018,17351822,6,1


Extract rows where continent is 'Oceania' or population is greater than 30 million

In [14]:
filtered_df = df[(df['continent'] == 'Oceania') | (df['population'] > 30000000)]
filtered_df

Unnamed: 0,country,continent,year,population,number_of_languages,year_index
0,Afghanistan,Asia,2018,37172386,5,1
2,Algeria,Africa,2018,42228429,2,1
3,American Samoa,Oceania,2017,55620,3,1
5,Angola,Africa,2018,30809762,9,1
7,Argentina,South America,2018,44494502,3,1
...,...,...,...,...,...,...
193,United States,North America,2018,327167434,12,1
195,Uzbekistan,Asia,2018,32955400,6,1
196,Vanuatu,Oceania,2018,292680,3,1
197,Venezuela,South America,2014,30045134,3,1


Total population of Europe

In [15]:
europe_population_sum = df[df['continent'] == 'Europe']['population'].sum()
print(f"Sum of all populations where continent is Europe: {europe_population_sum}")

Sum of all populations where continent is Europe: 717374613
