# Pandas 
Pandas is a powerful, open-source Python library widely used for data manipulation, analysis, and cleaning. It provides high-performance, easy-to-use data structures and data analysis tools for working with structured data.

- Key features of pandas include:

- Data Structures:
  * Series: A one-dimensional labeled array, capable of holding any data type.
  * DataFrame: A two-dimensional labeled data structure, similar to a table in a database or an Excel spreadsheet.
* Key Functionalities:
* Data cleaning and preparation.
* Handling missing data efficiently.
* Filtering and selecting data based on conditions.
* Merging, joining, and concatenating datasets.
* Grouping and aggregating data for summary statistics.
* Time-series analysis.

* Integration:
   * Works seamlessly with other data analysis libraries like NumPy, Matplotlib, and Scikit-learn.
Can read/write data from/to various file formats, such as CSV, Excel, SQL databases, JSON, and more.


In [1]:
import pandas as pd 

In [5]:
# series 
# Series is the one dimensional array like object that can hold any data type . It is similar to the columns in the table
data = [1,2,3,4,5]
series = pd.Series(data)
print("Series of the data is \n : " , series)  
print(type(series))

Series of the data is 
 :  0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [13]:
# create a Series from the dictinaries 

data = {'a':10, 'b':20, 'c':30, 'd':40, 'e':50}
series = pd.Series(data)
print("Series from the dictionary is \n : " , series)
print(type(series))

Series from the dictionary is 
 :  a    10
b    20
c    30
d    40
e    50
dtype: int64
<class 'pandas.core.series.Series'>


In [12]:
# without creating a dict we can able to get these in seperate and need to concate it 
data = [1,2,3]
index = ['a' , 'b' , 'c']
series = pd.Series(data, index = index)
print(" The series of the data is  : \n",series)
print(type(series))

 The series of the data is  : 
 a    1
b    2
c    3
dtype: int64
<class 'pandas.core.series.Series'>


In [17]:
# DataFrame 
# As the DataFrame is the 2D dataset consists of rows and columns and it is mutuable and hetrogeneous dataset 
# It is the multidimensional data 
# Creae a DataFrame from the dict of list

data = {
    'Name' : ['Anand' , 'Krish' , 'Greg'],
    'Age' : [22 , 40 , 38],
    'City' : ['Chennai' , 'Mumbai' , 'America']
}
df = pd.DataFrame(data=data)
print("The Dataset in rows and cols from the dict : \n " , df) # Here we converted our Dict to DataFrame
print(type(df))

The Dataset in rows and cols from the dict : 
      Name  Age     City
0  Anand   22  Chennai
1  Krish   40   Mumbai
2   Greg   38  America
<class 'pandas.core.frame.DataFrame'>


In [18]:
# create a data frame from the List of Dict

data = [
    {'Name' : 'Anand', 'Age' : 22, 'City' : 'Chennai'},
    {'Name' : 'Krish', 'Age' : 40, 'City' : 'Mumbai'},
    {'Name' : 'Greg', 'Age' : 38, 'City' : 'America'}
]
df = pd.DataFrame(data=data)
print("The Dataset is : \n " , df)
print(type(df))



The Dataset is : 
      Name  Age     City
0  Anand   22  Chennai
1  Krish   40   Mumbai
2   Greg   38  America
<class 'pandas.core.frame.DataFrame'>


In [20]:
# Read any kind of Data set using read_csv 
data = pd.read_csv('/Users/anand/Desktop/Machine Learning/DataSet/Online Sales Data.csv')
# print("The dataset for the first 5 Rows and Colm : /n " , data.head())
data.head()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [21]:
# Accessing Data from the Data Frame 
df

Unnamed: 0,Name,Age,City
0,Anand,22,Chennai
1,Krish,40,Mumbai
2,Greg,38,America


In [27]:
data.head()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [35]:
data['Product Category'].head() # Accessing the data from the data frame

0        Electronics
1    Home Appliances
2           Clothing
3              Books
4    Beauty Products
Name: Product Category, dtype: object