# Python | Pandas DataFrame

### What is Pandas?

<b>pandas</b> is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. 

### What is a Pandas DataFrame?

<b>Pandas DataFrame</b> is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). 

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. 

Pandas DataFrame consists of three principal components, the data, rows, and columns.

<img src="images/pandas.jpg">

A Pandas DataFrame will be created by loading the datasets from existing storage. Storage can be SQL Database, CSV file, and Excel file. 
Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc.

Dataframe can be created in different ways here are some ways by which we create a dataframe:


### Creating a dataframe using List:

In [None]:
# import pandas as pd
import pandas as pd
 
# list of strings
lyst = ['CSC', '102', 'is', 'the', 'best', 'course', 'ever']
 
# Calling DataFrame constructor on list
df = pd.DataFrame(lyst)

# Print the output.
print(df)

### Creating a dataframe using dict of narray/lists:

In [None]:
import pandas as pd

# Initialise data of lists (ensuring equal lengths)
data = {
    'Name': ['Angela', 'Precious', 'Luis', 'Ade', 'John'],  # Added 5th name
    'Age': [20, 21, 19, 18, 20]  # Now matches length
}

# Create DataFrame (fixed typo)
df = pd.DataFrame(data)

# Print the output
print(df)

### Column Selection:

In [None]:
# Import pandas package
import pandas as pd

# Define a dictionary containing employee data
data = {
    'Name': ['Clem', 'Prince', 'Edward', 'Adele'],
    'Age': [27, 24, 22, 32],
    'Address': ['Abuja', 'Kano', 'Minna', 'Lagos'],  # Fixed quotes
    'Qualification': ['Msc', 'MA', 'MCA', 'Phd']
}

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)  # Fixed alias (pd instead of pf)

# Select two columns
print(df[['Name', 'Qualification']])  # Added print() and fixed brackets

### Row Selection:
Pandas provide a unique method to retrieve rows from a Data frame.<br>
<i><font color="green">DataFrame.iloc[]</font></i> method is used to retrieve rows from Pandas DataFrame.<br>

In [None]:
import pandas as pd  # Fixed alias (changed 'pg' to 'pd')

# Define a dictionary containing employee data
data = {'Name':['Oyin', 'Mary', 'David', 'Bola'],
        'Age':[27, 24, 22, 32],
        'Address':['Asaba', 'Maiduguri', 'Onitsha', 'Kwara'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}  # Removed extra }

# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)  # Fixed capitalization of 'Frame'

# Select first row
print(df.iloc[0])  # Added print() for proper output

### Read from a file:

In [None]:
# importing pandas package
import pandas as pd

# making data frame from csv file
try:
    data = pd.read_csv("employee_records.csv")  # Fixed potential filename typo
    print(data)  # Proper printing of DataFrame
except FileNotFoundError:
    print("Error: File 'employee_records.csv' not found. Please check the file path.")
except Exception as e:
    print(f"An error occurred: {str(e)}")

### Select first row from file

In [None]:
# importing pandas package
import pandas as pd

try:
    # making data frame from csv file
    data = pd.read_csv("employee_records.csv")  # Reads CSV file
    
    # Get first row of data
    df = data.iloc[0]  # Fixed variable name from 'datas' to 'data'
    
    # Print the first row
    print(df)  # Fixed variable name and added print()
    
except FileNotFoundError:
    print("Error: 'employee_records.csv' file not found.")
except Exception as e:
    print(f"An error occurred: {str(e)}")

### Selecting Row with Title Header

In [None]:
# importing pandas package
import pandas as pd  # Fixed "panda" to "pandas"

try:
    # making data frame from csv file
    data = pd.read_csv("bcg.csv")  # Changed file extension to .csv
    
    # Print first row of data
    print(data.head(1))  # Directly printing without intermediate variable
    
except FileNotFoundError:
    print("Error: 'bcg.csv' file not found. Please check the filename and path.")
except Exception as e:
    print(f"An error occurred: {str(e)}")

### Looping over rows and columns
A loop is a general term for taking each item of something, one after another.<br> Pandas DataFrame consists of rows and columns so, in order to loop over dataframe, we have to iterate a dataframe like a dictionary.<br><br>
In order to iterate over rows, we can use two functions <i><font color="green">iteritems(), iterrows() </font></i>. These two functions will help in iteration over rows.

In [None]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
data_dict = {'name':["Abdul", "Chukwuemeka", "Seyi", "Matt"],
             'degree': ["MBA", "BCA", "M.Tech", "MBA"],
             'score':[90, 40, 80, 98]}
 
# creating a dataframe from a dictionary 
df = pd.DataFrame(data_dict)  # Fixed variable name

# iterating over rows using iterrows() function 
for index, row in df.iterrows():
    print(f"Index: {index}")
    print(row)
    print("-" * 30)  # Better separator than blank line
    

### Looping over Columns :
In order to loop over columns, we need to create a list of dataframe columns and then iterating through that list to pull out the dataframe columns.

In [None]:
# importing pandas as pd
import pandas as pd
   
# dictionary of lists
data = {'name':["Bello", "Kamara", "Ugochi", "David"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe from a dictionary 
df = pd.DataFrame(data)

# creating a list of dataframe columns
columns = list(df.columns)  # Fixed variable name
 
for col in columns:
    # printing the third element of the column
    print(f"Third element of {col}: {df[col][2]}")  # Added descriptive output

### Saving a DataFrame as CSV file

In [None]:
# importing pandas as pd
import pandas as pd
   
# dictionary of lists
records = {'name':["Abel", "Kamsi", "Oyode", "Chinelo"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe from a dictionary 
df = pd.DataFrame(records)  # Fixed variable name

try:
    # saving the dataframe without index
    df.to_csv('record.csv', index=False)  # Added index=False
    print("Data successfully saved to record.csv")
except Exception as e:
    print(f"Error saving file: {str(e)}")

## Class Project I


####  Go to www.kaggle.com

Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

#### Download the following dataset:
1. Top Apps in Google Play
2. Cryptocurrency Predict Artificial Intelligence V3
3. Programming Languages Trend Over Time

#### Clue
You can signin with either Google, facebook or Linkedin account

#### Task
Display the first 7 rows of each dataset<br>
Select the first 3 colums of each dataset<br>
Display only one row and header of each dataset


In [None]:
import pandas as pd

def analyze_dataset(file_path):
    try:
        
        df = pd.read_csv(file_path)
        
        print(f"\nAnalyzing dataset: {file_path}")
        print("\nFirst 7 rows:")
        print(df.head(7))
        
        print("\nFirst 3 columns:")
        print(df.iloc[:, :3].head())  
        
        print("\nOne row with headers:")
        print(df.head(1))  
    
    except FileNotFoundError:
        print(f"Error: File {file_path} not found")
    except Exception as e:
        print(f"An error occurred with {file_path}: {str(e)}")


datasets = [
    "top_apps_google_play.csv",
    "cryptocurrency_predict_ai_v3.csv",
    "programming_languages_trend.csv"
]

for dataset in datasets:
    analyze_dataset(dataset)

## Class Project II

<b>Cadbury Nigeria Plc</b> manufactures and sells branded fast moving consumer goods to the Nigerian market and exports in West Africa. The Company produces intermediate products, such as cocoa butter, liquor, cake and powder. It exports cocoa butter, cake and liquor to international customers, and cocoa powder locally. It operates through three segments: Refreshment Beverages, Confectionery and Intermediate Cocoa Products. The Refreshment Beverages segment includes the manufacture and sale of Bournvita and Hot Chocolate. The Confectionery segment includes the manufacture and sale of Tom Tom and Buttermint. The Intermediate Cocoa Products segment includes the manufacture and sale of cocoa powder, cocoa butter, cocoa liquor and cocoa cake. The Refreshment Beverages' brands include CADBURY BOURNVITA and CADBURY 3-in-1 HOT CHOCOLATE. The Confectionery's brands include TOMTOM CLASSIC, TOMTOM STRAWBERRY and BUTTERMINT. The Intermediate Cocoa Products' brands include COCOA POWDER and COCOA BUTTER.

You have been employed as an expert python developer to create a program to document the consumption categories of their products and brands. Using your knowledge of Pandas DataFrames develop the program that saves the list of products (export, segments and brands) in a .csv excel file.<br><br>
Hint: save the filename as <font color="green"><i>cadbury_market.csv</i></font>.

In [None]:
import pandas as pd


data = {
    'Segment': ['Refreshment Beverages', 'Refreshment Beverages', 
                'Confectionery', 'Confectionery', 'Confectionery',
                'Intermediate Cocoa Products', 'Intermediate Cocoa Products',
                'Intermediate Cocoa Products', 'Intermediate Cocoa Products'],
    'Product': ['Beverage', 'Beverage', 
               'Candy', 'Candy', 'Candy',
               'Cocoa Product', 'Cocoa Product', 
               'Cocoa Product', 'Cocoa Product'],
    'Brand': ['CADBURY BOURNVITA', 'CADBURY 3-in-1 HOT CHOCOLATE',
              'TOMTOM CLASSIC', 'TOMTOM STRAWBERRY', 'BUTTERMINT',
              'COCOA POWDER', 'COCOA BUTTER', 
              'COCOA LIQUOR', 'COCOA CAKE'],
    'Export': ['No', 'No', 
              'No', 'No', 'No',
              'Yes', 'Yes', 
              'Yes', 'Yes']
}


cadbury_df = pd.DataFrame(data)

try:
   
    cadbury_df.to_csv('cadbury_market.csv', index=False)
    print("Cadbury product documentation successfully saved to 'cadbury_market.csv'")
    
    
    print("\nSaved Data Preview:")
    print(cadbury_df.head())
    
except Exception as e:
    print(f"Error saving file: {str(e)}")