## Basic Operations on DataFrames and Series

### Viewing and Inspecting Data

Pandas provides several functions to quickly and easily inspect your data.

In [None]:
import pandas as pd

In [None]:
# Sample DataFrame
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [25, 32, 22, 35, 47],
    "Score": [85, 63, 77, 96, 54]
})

# Head
print(df.head(20))  # Returns the first 3 rows of the DataFrame

In [None]:
# Tail
print(df.tail(3))  # Returns the last 3 rows of the DataFrame

In [None]:
# Info
print(df.info())  # Provides a summary of the DataFrame including the number of non-null entries in each column

In [None]:
# Describe
print(df.describe())  # Provides descriptive statistics of the DataFrame

### Indexing and Selection

Pandas provides several methods for indexing and selecting data including label-based indexing using `.loc` and integer-based indexing using `.iloc`.

In [None]:
# iloc
print(df.iloc[3])  # Selects row by integer location

In [None]:
# loc
print(df.loc[3])  # Selects row by index label

In [None]:
df2 = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [25, 32, 22, 35, 47],
    "Score": [85, 63, 77, 96, 54]
},
index=['first', 'second', 'third', 'fourth', 'fifth'])
print(df2)


In [None]:
df2.loc['third']

### Adding, Renaming, and Removing Columns

In pandas, it's easy to add, rename, and remove columns.

In [None]:
# Adding a new column
df['Grade'] = ['A', 'C', 'B', 'A', 'F']

Adding a single value to all rows of a new column

In [None]:
df['IsActive'] = True

In [None]:
df

Using list comprehension

In [None]:
values = [i for i in range(1,11,2)]
df['randomNum'] = values
df

Using other columns

In [None]:
df['Double_Age'] = df['Age'] * 2

In [None]:
df

In [None]:
# Renaming columns
df = df.rename(columns={'StudentName': 'SName'}, inplace=False)

In [None]:
df

In [None]:
# Removing columns
df.drop('Grade', axis=1, inplace=True)

In [None]:
df

### Performing operations on Columns

We can directly interact with the columns to change their values or create additional columns which are a product of other columns after performing a certain operation on them.

In [None]:
df['Double_Age'] = df['Age'] * 2 # Creates a new column 'Double_Age' with the original age multiplied by 2

In [None]:
df['Age'] = df['Age'] * 2 # Overwrites the 'Age' column with the original values multiplied by 2

### Handling Missing Values

Pandas provides several methods for detecting, removing, and replacing null values in dataframes.

In [None]:
# Checking for missing values
print(df.isna())  # Returns a DataFrame where each cell is either True or False depending on that cell's null status.

In [None]:
df['Age'].iloc[0] = pd.NA

In [None]:
df

In [None]:
# Filling missing values
df.fillna(value=25, inplace=True)  # Fills NA/NaN values with the specified value (in this case, with 'NA' string).

In [None]:
df

In [None]:
# Dropping rows with missing values
df.dropna(inplace=True)  # Removes rows where at least one element is missing.

### Replacing (missing) values
We can use the dataframe's replace function to replace any number of values that we choose, with other values. This can be used to fill in missing values, or to change problematic values with new ones.
There are many ways to specify which value to replace and with what. The simplest is using two single values:

In [None]:
df

In [None]:
df.replace(to_replace=96, value=98)

We can also use lists to replace multiple values at once.

In [None]:
df.replace(to_replace=[25,32], value=24)

Or we can use two lists, which must be the same length, to give a replacement value for each value in the first list:

In [None]:
df.replace(to_replace=[24,47], value=[25,50])

### Removing Duplicates
We can use the dataframe duplicated function to check which rows contain duplicates. We'll get a series with boolean values, in which a row is True if all of the columns in the original dataframe for that row, occur in a previous row:

In [None]:
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve", "Alice", "Alice"],
    "Age": [25, 32, 22, 35, 47, 25, 25],
    "Score": [85, 63, 77, 96, 54, 97, 85]
})
df


Unnamed: 0,Name,Age,Score
0,Alice,25,85
1,Bob,32,63
2,Charlie,22,77
3,David,35,96
4,Eve,47,54
5,Alice,25,97
6,Alice,25,85


In [None]:
df.duplicated(keep="last")

0     True
1    False
2    False
3    False
4    False
5     True
6    False
dtype: bool

Once we've seen which rows are duplicated, we can remove them with the dataframe drop_duplicates function:

In [None]:
df.drop_duplicates(keep='last',subset=['Name', 'Age'])

Unnamed: 0,Name,Age,Score
1,Bob,32,63
2,Charlie,22,77
3,David,35,96
4,Eve,47,54
6,Alice,25,85


In [None]:
df

Unnamed: 0,Name,Age,Score
0,Alice,25,85
1,Bob,32,63
2,Charlie,22,77
3,David,35,96
4,Eve,47,54
5,Alice,25,85
6,Alice,25,85


### Data Type Conversion

The `astype()` function enables you to convert data from one type to another.

In [None]:
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [25, 32, 22, 35, 47],
    "Score": [85, 63, 77, 96, 54]
})
df.info()

In [None]:
# Converting a column data type
df['Age'] = df['Age'].astype('float')

In [None]:
df

In [None]:
df['Namee'] = ['a','b','c','d','e']

In [None]:
df

### Simple DataFrame Operations

Pandas provides a wealth of functions and methods for general data manipulation and analysis.

In [None]:
# Mean of numeric columns
print(df.mean())

In [None]:
# Minimum of numeric columns
print(df.min())

In [None]:
# Maximum of numeric columns
print(df.max())

In [None]:
# Standard deviation of numeric columns
print(df.std())

Filtering dataframes by rows/conditions. You can get a specific set of rows from a dataframe using a number of methods. One example is using square brackets:

In [None]:
df['Score'] > 55

In [None]:
df[df['Score'] > 65]

In [None]:
# age < 30 & (and) score > 55
df[(df['Age'] < 30) & (df['Score'] > 55)]

### Uses of Standard Deviation
* In manufacturing and process control, standard deviation can indicate the consistency of product or process performance.
* It helps in understanding the variability or volatility in a dataset. For instance, in finance, a higher standard deviation of stock returns signifies higher risk, as the price of the stock is more volatile.

### Grouping Data with Pandas

The `groupby()` function is a powerful tool that allows you to group your data based on the values of specific columns, then apply a function to the results.

**Code Example:**

In [None]:
df = pd.DataFrame({
    'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
    'Max Speed': [380., 370., 24., 26.],
    'Weight': [1, 1.5, 0.3, 0.5]
})

# Grouping the data by animal and finding the average for each group
df_grouped = df.groupby('Animal').mean()

print(df_grouped)

        Max Speed  Weight
Animal                   
Falcon      375.0    1.25
Parrot       25.0    0.40


Remember, pandas is a powerful tool that can make data analysis tasks easier, efficient, and more flexible. These are just the basics, and as you get comfortable with pandas, you can start exploring more complex data manipulations and analysis.

## Exercise: Basic Operations with Pandas

Let's continue to work with the fruit vendor data. We will use the DataFrame that you created in the previous exercise. If you haven't done so, load the `store_1_stock.csv` file into a DataFrame.

### Exercise 4: Viewing and Inspecting Data

Inspect your DataFrame using the head(), tail(), info(), and describe() methods. What is the total quantity of fruits in the store? What are the top three fruits in the store by quantity?<small>

Hint: To get the total quantity use `df['Quantity'].sum()`. And for the top three fruits, use the `nlargest()` method. Here's an example on how to use it:

In [None]:
# This will return the 3 largest values in the 'Score' column
print(df.nlargest(3, 'Score'))

</small>

### Exercise 5: Indexing and Selection

Try to select the quantity of 'Bananas' from the DataFrame using both loc and iloc methods.
<small>
Hint: Assume that 'Fruit' is the index of the DataFrame. You can set it using df.set_index('Fruit', inplace=True). Then, you can use loc like this: df.loc['Bananas', 'Quantity'].
</small>

### Exercise 6: Adding, Renaming, and Removing Columns

Let's suppose the store is running a sale, and all fruits are sold at 90% of their original price.

- Add a new column named 'Sale_Price' to the DataFrame that is 90% of the original price.
- Rename the 'Price' column to 'Original_Price'.
- Remove the 'Color' column from the DataFrame.

### Exercise 7: Handling Missing Values

Let's say you found that the quantity of 'Grapes' is missing in your DataFrame.

- Add a new row for 'Grapes' with a missing quantity. You can add this row by creating a new DataFrame with the details of 'Grapes' and then appending it to the existing DataFrame.
<small>
To add a new row you can use the following example:

In [None]:
import numpy as np

new_data = pd.DataFrame({"Fruit": ["Grapes"], "Quantity": [np.nan], "Price": [3.5]})
df = pd.concat([df, new_data], ignore_index=True) # pd.concat adds a new row

</small>
- Check if there are any missing values in the 'Quantity' column.

### Exercise 8: Data Type Conversion

Check the data type of the 'Quantity' column using `df['Quantity'].dtype`. If it is not integer, convert it using `astype()`.

### Exercise 9: Simple DataFrame Operations

Calculate the following:

- The total quantity of all fruits.
- The name and quantity of the fruit with the highest quantity.
- The name and quantity of the fruit with the lowest quantity.
- The average original price per kg of fruits.

<small>
Hint: For the second and third bullet points, consider using the `idxmax()` or `idxmin()` functions, or boolean indexing.
</small>