# Activity: Introduction to Pandas

In [None]:
import pandas as pd

data = pd.read_csv('https://raw.githubusercontent.com/dvddepennde/crops_public_data/main/Crop_recommendation.csv')
data

### **Showing data**

Show first 5 rows in the DataFrame

In [None]:
data.head(5) # or data[:5]

Show last 5 rows in the DataFrame

In [None]:
data.tail(5)

Show 5 random rows in the DataFrame

In [None]:
data.sample(5)

Show the structure, the number of rows and columns in the DataFrame:

In [None]:
data.shape

Display the name of the columns:

In [None]:
data.columns

How to rename a column:

In [None]:
data.rename(columns={'temperature': 'TEMPERATURE'})
# To keep the data persistent, we need to do the following:
# data = data.rename(columns={'temperature': 'TEMPERATURE'})
# or
# data.rename(columns={'temperature': 'TEMPERATURE'}, inplace=True)

### **Column Access**

To access to the value of a single column:

In [None]:
data['ph']

Display the number of distinct values of each class in a single column:

In [None]:
data['label'].value_counts()

Add a new column in the data:

In [None]:
data['columns'] = 'Example'
data.head(5)

Drop a column in the data:

In [None]:
del data['columns']
data.head(5)

Obtain a subset of columns and their values:

In [None]:
data[['N', 'P', 'K']]

### **Info Data**

Obtain relevant information about columns in the DataFrame, for example, here we can observe:
* The datatype of each column.
* The number of Non-Null records by column.
* Length of the data

In [None]:
data.info()

Obtain statistical information about our numerical variables:

In [None]:
data.describe()

Number of different values in each column:

In [None]:
data.nunique()

Number of NULL values per column:

In [None]:
data.isnull().sum()

From a single column, obtain the different values it takes:

In [None]:
data['label'].unique()

Access to a single row in the data:

In [None]:
data.iloc[2] # This is the third one, indexes in computer science always begin with 0.

Obtain the maximum, minimum, and mean values in a column:

In [None]:
print(f"Maximum Temperature: {data['temperature'].max()}")
print(f"Minimum Temperature: {data['temperature'].min()}")
print(f"Mean Temperature: {data['temperature'].mean()}")

### **Mathematical operations with columns**

Sum of columns:

In [None]:
data['K'] + data['N']

Product with a scalar:

In [None]:
data['K'] * 2

### **Sorting Data**

Sort data by one or more columns:

In [None]:
data.sort_values(by=['K', 'N'], ascending=True).head(4)

### **Filtering Data**

Filter data by exact value of label

In [None]:
data[data['label'] == 'orange'][:5]

Filter data with a greater operator:

In [None]:
data[data['humidity'] > 98][:5]

Obtain the number of records with a single condition

In [None]:
print(f"Number of records with humidity greater than 98: {len(data[data['humidity'] > 98])}")

Obtain data with multiple conditions:

In [None]:
data[(data['label'] == 'orange')&(data['humidity'] > 94.7)] # If we want to use the OR operand, we must use | instead of &

Obtain the data that match the max value of the column:

In [None]:
data[data['humidity'] == data['humidity'].max()]

Filtering the dataset to include only rows where the 'label' column contains the substring 'co':

In [None]:
# .sample() retrieve random rows
data[data['label'].str.contains('co')].sample(5)

## **Now it's your turn!**

# **Activities**

**1.** **Display the last 2 rows of the DataFrame:**

**2.** **Calculate the mean value of the column 'humidity':**

**3.** **Create a column with the temperature in Kelvin (Celsius + 273) named 'K_temperature' and sort it in descending order by its new column.**

**4.** **Display the columns 'N', 'P', 'K' and 'temperature' of those whose label is 'coffee' or 'humidity' is greater than or equal to 92.**

**5.** **Display the number of records that have a 'temperature' between 24 and 25 and 'humidity' greater than 90.**