# PANDAS 

## INTRODUCTION

Pandas is an open source library designed primarily for easy and intuitive manipulation of relational or tagged data. It provides various data structures and operations for manipulating numerical data and time series.

Pandas is fast and it has high performance & productivity for users. Python with Pandas is used in a variety of disciplines, including academic and commercial disciplines such as finance, economics, statistics, and analytics.

![image.png](attachment:image.png)  ![image-2.png](attachment:image-2.png)

## PROBLEM

Tabular format is still the most common way  to store data, and there is no better tool for exploring data tables.

A convenient tabular data processor that provides different ways to load, process, and export datasets to many output formats. Something to handle large amounts of data limited by the memory of the PC.

## SOLUTION

Pandas provide a highly optimized format of data representation. This will help you analyze and understand the data. Simpler data representations give better results in data science projects.

Pandas in python is so popular because of the following advantages: 

- Fast and efficient for manipulating and analyzing data. 
- Data from different file objects can be loaded. 
- Easy handling of missing data
- Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects 
- Data set merging and joining. 
- Flexible reshaping and pivoting of data sets 
- Provides time-series functionality. 
- Powerful group by functionality for performing split-apply-combine operations on data sets. 
- Easy handling of missing data. 
- Use Series for one-dimensional data structures and DataFrame for multidimensional data structures. 
- It provides an efficient way to divide the data. 
- It provides a flexible way to merge, concatenate, or refactor data.


## CONTENTS

- Getting Started 
- Series
  - Creating a Series
- DataFrame
  - Creating a DataFrame

### Getting Started

In [None]:
# To download pandas use the below code
! pip install pandas

In [None]:
# next is to import pandas 
import pandas as pd

Pandas generally provide two data structures for manipulating data, They are: 

* Series 
* DataFrame 

![image.png](attachment:image.png)

### Series

Pandas Series is a one-dimensional labelled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called indexes. Pandas Series is nothing but a column in an excel sheet. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

#### Creating a Series

In the real world, a Pandas Series will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, an Excel file. Pandas Series can be created from the lists, dictionary, and from a scalar value etc.

In [None]:
# example program to create an empty series using pandas and numpy 
import pandas as pd
import numpy as np
 
# Creating empty series
ser = pd.Series()
   
print(ser)

In [1]:
import pandas as pd
import numpy as np
 
x = pd.Series([1, 2, 2, np.nan], index=['p', 'q', 'r', 's'])
x

p    1.0
q    2.0
r    2.0
s    NaN
dtype: float64

![image-2.png](attachment:image-2.png)

In [3]:
# example program to create pandas series with the help of array
a = np.array(['h','e','l','l','o','R','B','G'])
ser = pd.Series(a)
print(ser)

0    h
1    e
2    l
3    l
4    o
5    R
6    B
7    G
dtype: object


- To access elements in pandas "Series" you can use the indexes of the element.

In [4]:
# aceesing elements from the series'
a = np.array(['h','e','l','l','o','R','B','G'])
ser = pd.Series(a[3]) # accessing third element
print(ser)
ser = pd.Series(a[7])
print(ser) # accessing seventh element

0    l
dtype: object
0    G
dtype: object


### DataFrame

Pandas DataFrame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

#### Creating a DataFrame:

In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, an Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionaries, etc.

In [5]:
# example program for creating an empty DataFrame
import pandas as pd
   
# Calling DataFrame constructor
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


In [6]:
# example program for creating an DataFrame using list of strings
# list of strings
lst = ['Hello', 'All', 'Welcome', 'To', 
            'RBG']
   
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)

         0
0    Hello
1      All
2  Welcome
3       To
4      RBG


In [7]:
import pandas as pd

data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
      }
purchases = pd.DataFrame(data)

purchases

Unnamed: 0,apples,oranges
0,3,0
1,2,3
2,0,7
3,1,2


![image.png](attachment:image.png)