# Introduction

Pandas is a powerful and popular Python library used for data manipulation and analysis. It provides data structures and functions to efficiently handle structured data, including tabular data such as spreadsheets and SQL tables

# Why Use Pandas?
1. **Efficient Data Handling:** Pandas allows you to efficiently handle large datasets with millions of rows and columns.
2. **Flexible Data Structures:** Pandas provides two primary data structures: Series (1-dimensional labeled array) and DataFrame (2-dimensional labeled data structure with columns of potentially different types).
3. **Easy Data Manipulation:** Pandas provides various methods for data manipulation, such as filtering, sorting, grouping, and merging data.
4. **Integration with Other Libraries:** Pandas integrates well with other popular Python libraries, such as NumPy, Matplotlib, and Scikit-learn, making it a great tool for data science and scientific computing.


# Types of Data Structure
Pandas provides two primary data structures:
1. Series (1-dimensional labeled array): pd.Series
2. DataFrame (2-dimensional labeled data structure with columns of potentially different types): pd.DataFrame

## Pandas Series
What is a Pandas Series?
<ul style="margin:0">
<li> A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).</li>
<li>The axis labels are collectively referred to as the index.</li>
<li>Series is similar to a column in a DataFrame or a single column of data in a table.</li>
</ul>

In [1]:
#Creating a Series from list
import pandas as pd
import numpy as np

data = [1, 2, 3, 4, 5]
s = pd.Series(data)
s

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [2]:
#Creating a Series from numpy array
np_array = np.array([1, 2, 3, 4, 5])
data = pd.Series(np_array)
data

0    1
1    2
2    3
3    4
4    5
dtype: int32

In [3]:
#Creating a Series from dictionary
d = {'key1':'a', 'key2':'b', 'key3':'c', 'key4':'d'}
data = pd.Series(d)
data

key1    a
key2    b
key3    c
key4    d
dtype: object

In [4]:
#Creating a Series from list with custom index

data = [1, 2, 3, 4, 5]
s = pd.Series(data, index=['a', 'b', 'c', 'd', 'e'])
s

a    1
b    2
c    3
d    4
e    5
dtype: int64

## Accessing Data in a Series

In [5]:
# Using Index
s['a']

1

In [6]:
# by integer location
s[0]

1

In [7]:
# using slicing
s[1:3]

b    2
c    3
dtype: int64

In [8]:
#using .iloc and .loc
print(s.iloc[0])
print(s.loc['b'])

1
2


## Pandas DataFrame
The pandas DataFrame is a two-dimensional table of data with column and row indexes. The columns are made up of pandas Series objects.
<br>
<img src="img/Picture1.png">

### Get your data into a DataFrame
1. Load a DataFrame from a CSV file
<img src="img/Picture4.png">
<br>
2. Load DataFrames from a Microsoft Excel file
<img src="img/Picture5.png">
<br>
3. Load a DataFrame from a MySQL database
<img src="img/Picture6.png"> 
<br>
4. Data in Series then combine into a DataFrame
<img src="img/Picture7.png"> 
<br>
5. Get a DataFrame from data in a Python dictionary
<img src="img/Picture8.png">
<img src="img/Picture9.png">

In [9]:
#Load dataframe from csv
df = pd.read_csv("data/movies.csv")
df.head()

Unnamed: 0,Film,Genre,Lead Studio,Audience score %,Profitability,Rotten Tomatoes %,Worldwide Gross,Year
0,Zack and Miri Make a Porno,Romance,The Weinstein Company,70,1.747542,64,$41.94,2008
1,Youth in Revolt,Comedy,The Weinstein Company,52,1.09,68,$19.62,2010
2,You Will Meet a Tall Dark Stranger,Comedy,Independent,35,1.211818,43,$26.66,2010
3,When in Rome,Comedy,Disney,44,0.0,15,$43.04,2010
4,What Happens in Vegas,Comedy,Fox,72,6.267647,28,$219.37,2008


The pandas Index provides the axis labels for the Series and DataFrame objects. It can only contain hashable objects. A pandas Series has one Index; and a DataFrame has two Indexes.
<br>
<img src="img/Picture3.png">

In [10]:
# get index and columns from dataframe
idx = df.index
print(idx)
cols = df.columns
print(cols)

RangeIndex(start=0, stop=77, step=1)
Index(['Film', 'Genre', 'Lead Studio', 'Audience score %', 'Profitability',
       'Rotten Tomatoes %', 'Worldwide Gross', 'Year'],
      dtype='object')


In [11]:
# some index attributes
b = idx.is_monotonic_decreasing
print(b)
b = idx.is_monotonic_increasing
print(b)
b = idx.has_duplicates
print(b)
i = idx.nlevels
print(i)

False
True
False
1


In [12]:
# some index methods
a = idx.values
print(type(a), a)

<class 'numpy.ndarray'> [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76]


In [13]:
l = idx.tolist()
print(type(l), l)

<class 'list'> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76]


In [14]:
idx1 = idx.astype('str')
idx1

Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
       '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',
       '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36',
       '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48',
       '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60',
       '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72',
       '73', '74', '75', '76'],
      dtype='object')

In [15]:
i = idx.nunique()
print(i)

77


In [16]:
max_label = idx.max()
print(max_label)
min_label = idx.min()
print(min_label)

76
0


### Saving a DataFrame
1. Saving a DataFrame to a CSV file
<img src="img/Picture10.png">
<br>
2. Saving DataFrames to an Excel Workbook
<img src="img/Picture11.png">
<br>
3. Saving a DataFrame to MySQL
<img src="img/Picture12.png"> 
<br>
4. Saving a DataFrame to a Python dictionary
<img src="img/Picture13.png"> 
<br>
5. Saving a DataFrame to a Python string
<img src="img/Picture14.png"> 
<br>