# Introduction to Pandas

* Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like series and dataframes to effectively easily clean, transform, and analyze large datasets and integrates seamlessly with other python libraries, such as numPy and matplotlib. It offers powerful functions for data transformation, aggregation, and visualization, which are crucial for effective analysis.

* Pandas revolves around two primary data structures: series (1D) for single columns and dataframe (2D) for tabular data, enabling efficient data manipulation.

* Think of it like a super-powered spreadsheet that can hold a lot of information in an organized way. It helps you store and manipulate data, like sorting it, filtering out certain details, or even combining information from different sources.

* For example, imagine you have a list of students and their scores in different subjects. With Pandas, you can easily find the highest score, calculate averages, or even group students by their scores. You can also clean up messy data—like removing empty or incorrect information—so everything is neat and ready for analysis.




# Creating Pandas DataFrame
* Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc.

* Imagine you have a table of data, like an Excel sheet or a list of information. Each row in the table represents a person or an item, and each column holds a specific type of information about them, like their name, age, or score.

* In the world of programming, we can create this type of table (called a DataFrame) using something called Pandas, which is like a super-powered spreadsheet. Here's how you can think of it:

* Create the table (DataFrame):
Think of a DataFrame as a big box where you can store lots of rows and columns. Each column can have a different kind of information (like numbers, words, or dates), and each row represents a single record, like a person or an event.

* Putting data into the table:
You create a DataFrame by telling it what kind of information you have. For example, you could say, "I have a list of names and ages," and Pandas will organize that data into columns for you.

In [3]:
##### Creating Empty DataFrame:

# import pandas as pd
import pandas as pd
 
# Calling DataFrame constructor
df = pd.DataFrame()
 
print(df)

Empty DataFrame
Columns: []
Index: []


In [4]:
##### Creating a dataframe using list

# import pandas as pd
import pandas as pd
 
# list of strings
aviation_lst = ['Pilot', 'Aircraft', 'Altitude', 'Runway', 'Control Tower', 'Flight Plan', 'Cockpit', 'Takeoff', 'Landing', 'Turbulence']

# Calling DataFrame constructor on list
df = pd.DataFrame(aviation_lst)
print(df)

               0
0          Pilot
1       Aircraft
2       Altitude
3         Runway
4  Control Tower
5    Flight Plan
6        Cockpit
7        Takeoff
8        Landing
9     Turbulence


In [5]:
##### Creating dataframe from dict of ndarray/lists:
### To create a dataframe from dict of ndarray/list, all narray must be of same length
# 1. If index is passed (length of index = length of array)
# 2. If no index is passed then by default index will be range(n), where n is the length of the array. 

data = {'Name': ['Popeye', 'Homer Simpson', 'Bugs Bunny', 'Scooby-Doo'],
        'Catchphrase': ['I yam what I yam!', 'D’oh!', 'What’s up, Doc?', 'Ruh-roh!']}

# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
print(df)

            Name        Catchphrase
0         Popeye  I yam what I yam!
1  Homer Simpson              D’oh!
2     Bugs Bunny    What’s up, Doc?
3     Scooby-Doo           Ruh-roh!


In [6]:
##### Creating dataframe from lists using dictionary
food_dict = {
    'dish': ["Biryani", "Pasta", "Sushi", "Tacos"],
    'cuisine': ["Indian", "Italian", "Japanese", "Mexican"],
    'spice_level': ["High", "Medium", "Low", "Medium"]
}

df = pd.DataFrame(food_dict)
print(df)

      dish   cuisine spice_level
0  Biryani    Indian        High
1    Pasta   Italian      Medium
2    Sushi  Japanese         Low
3    Tacos   Mexican      Medium


In [7]:
##### Creating a dataframe from another dataframe
import pandas as pd

original_df = pd.DataFrame({
    'Cartoon': ['Tom & Jerry', 'SpongeBob', 'Naruto', 'Mickey Mouse'],
    'Year': [1940, 1999, 2002, 1928]
})

new_df = original_df[['Cartoon']]  
print(new_df)


        Cartoon
0   Tom & Jerry
1     SpongeBob
2        Naruto
3  Mickey Mouse


In [8]:
##### Creating dataframe from dictionary of series

import pandas as pd

# Initialize data to Dicts of series.
cartoon_data = {
    'Character': pd.Series(['Tom', 'SpongeBob', 'Naruto', 'Mickey'], 
                           index=['a', 'b', 'c', 'd']),
    'Show': pd.Series(['Tom & Jerry', 'SpongeBob SquarePants', 'Naruto', 'Mickey Mouse Clubhouse'], 
                      index=['a', 'b', 'c', 'd'])
}

# Create DataFrame
df = pd.DataFrame(cartoon_data)

print(df)


   Character                    Show
a        Tom             Tom & Jerry
b  SpongeBob   SpongeBob SquarePants
c     Naruto                  Naruto
d     Mickey  Mickey Mouse Clubhouse


In [9]:
##### Create dataframe using zip() function --- 2 lists can be merged by using zip() function
Name = ['Popeye', 'Homer Simpson', 'Bugs Bunny', 'Scooby-Doo'] # list1

Catchphrase = ['I yam what I yam!', 'D’oh!', 'What’s up, Doc?', 'Ruh-roh!']#List 2

list_of_tuples = list(zip(Name, Catchphrase))

list_of_tuples

# Converting lists of tuples into pandas Dataframe.
df = pd.DataFrame(list_of_tuples,
                  columns=['Name', 'Catchphrase'])

print(df)




            Name        Catchphrase
0         Popeye  I yam what I yam!
1  Homer Simpson              D’oh!
2     Bugs Bunny    What’s up, Doc?
3     Scooby-Doo           Ruh-roh!


In [10]:
##### Craeting a dataframe by providing the index label
food_dict = {
    'dish': ["Biryani", "Pasta", "Sushi", "Tacos"],
    'cuisine': ["Indian", "Italian", "Japanese", "Mexican"],
    'spice_level': ["High", "Medium", "Low", "Medium"]
}
# Creates pandas DataFrame.
df = pd.DataFrame(food_dict, index=['rank1',
                               'rank2',
                               'rank3',
                               'rank4'])

# print the data
print(df)

          dish   cuisine spice_level
rank1  Biryani    Indian        High
rank2    Pasta   Italian      Medium
rank3    Sushi  Japanese         Low
rank4    Tacos   Mexican      Medium
