# Pandas Introduction

# What is Pandas?


* Pandas is a Python library used for working with data sets.

* It has functions for analyzing, cleaning, exploring, and manipulating data.

* The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.



# Why Use Pandas?
* Pandas allows us to analyze big data and make conclusions based on statistical theories.

* Pandas can clean messy data sets, and make them readable and relevant.

* Relevant data is very important in data science

# What Can Pandas Do?
* Pandas gives you answers about the data. Like:

* Is there a correlation between two or more columns?
What is average value?
Max value?
Min value?
* Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.



# Pandas Series

* Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index. 

* To create Series with any of the methods make sure to import pandas library.

Creating an empty Series: Series() function of Pandas is used to create a series. A basic series, which can be created is an Empty Series. 

In [1]:
# import pandas as pd
import pandas as pd

# Creating empty series
ser = pd.Series()

print(ser)               #By default, the data type of Series is float. 


Series([], dtype: float64)


  ser = pd.Series()


# Creating a series from array: 
* In order to create a series from NumPy array, we have to import numpy module and have to use array() function. 

In [3]:
# import pandas as pd
import pandas as pd

# import numpy as np
import numpy as np

# simple array
data = np.array(['a', 'e', 'i', 'o', 'u'])

ser = pd.Series(data)
print(ser)                             #By default, the index of the series starts from 0 till the length of series -1.


0    a
1    e
2    i
3    o
4    u
dtype: object


# Creating a series from array with an index: 
* In order to create a series by explicitly proving index instead of the default, we have to provide a list of elements to the index parameter with the same number of elements as it is an array. 

In [4]:
# import pandas as pd
import pandas as pd

# import numpy as np
import numpy as np

# simple array
data = np.array(['a', 'e', 'i', 'o', 'u'])

# providing an index
ser = pd.Series(data, index=[10, 11, 12, 13, 14])
print(ser)


10    a
11    e
12    i
13    o
14    u
dtype: object


# Creating a series from Lists:
* In order to create a series from list, we have to first create a list after that we can create a series from list. 

In [5]:
import pandas as pd

list = ['a', 'e', 'i', 'o', 'u']

# create series form a list
ser = pd.Series(list)
print(ser)


0    a
1    e
2    i
3    o
4    u
dtype: object


# Creating a series from Dictionary: 
* In order to create a series from the dictionary, we have to first create a dictionary after that we can make a series using dictionary. Dictionary keys are used to construct indexes of Series.  

In [6]:
import pandas as pd

# a simple dictionary
dict = {  'Name': 'Shreyash',
           'Age': 20,
        'weight': 70,
        'height': 175}

# create series from dictionary
ser = pd.Series(dict)

print(ser)


Name      Shreyash
Age             20
weight          70
height         175
dtype: object


# Creating a series from Scalar value: 
* In order to create a series from scalar value, an index must be provided. The scalar value will be repeated to match the length of the index. 

In [7]:
import pandas as pd

import numpy as np

# giving a scalar value with index
ser = pd.Series(10, index=[0, 1, 2, 3, 4, 5])

print(ser)


0    10
1    10
2    10
3    10
4    10
5    10
dtype: int64


# Creating a series From a Numpy Array:
* You can create a Series from a NumPy array, which is useful when working with numerical data.

In [2]:
import pandas as pd
import numpy as np

data = np.array([0,1,2,3,4,5])
series = pd.Series(data)
print(series)


0    0
1    1
2    2
3    3
4    4
5    5
dtype: int32


In [3]:
print(series.index)
print(series.values)
print(series.dtype)


RangeIndex(start=0, stop=6, step=1)
[0 1 2 3 4 5]
int32


# Methods
* series.head(n): Returns the first n elements.
* series.tail(n): Returns the last n elements.
* series.describe(): Provides descriptive statistics for the Series.
* series.mean(): Computes the mean of the Series values.
* series.median(): Computes the median of the Series values.
* series.max(): Returns the maximum value.
* series.min(): Returns the minimum value.
* series.sum(): Returns the sum of the Series values.
* series.value_counts(): Returns the counts of unique values.
* series.apply(func): Applies a function to each element in the Series.
* series.map(func): Maps values of the Series using a given function.

In [9]:
print(series.head(3))
print(series.tail(2))
print(series.describe())
print(series.value_counts())
print(series.apply(lambda x: x * 2))
print(series.map(lambda x: x ** 2))


0    0
1    1
2    2
dtype: int32
4    4
5    5
dtype: int32
count    6.000000
mean     2.500000
std      1.870829
min      0.000000
25%      1.250000
50%      2.500000
75%      3.750000
max      5.000000
dtype: float64
0    1
1    1
2    1
3    1
4    1
5    1
dtype: int64
0     0
1     2
2     4
3     6
4     8
5    10
dtype: int64
0     0
1     1
2     4
3     9
4    16
5    25
dtype: int64


# Series Operations 
Pandas Series supports a wide range of operations that allow for effective data manipulation and analysis.

# Arithmetic Operations
* You can perform arithmetic operations between Series objects or between a Series and a scalar value. These operations are performed element-wise and support automatic alignment based on the Series index.

In [16]:
#Addition (+)

import pandas as pd

series1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
series2 = pd.Series([4, 5, 6], index=['a', 'b', 'd'])
sum_series = series1 + series2
print(sum_series)


a    5.0
b    7.0
c    NaN
d    NaN
dtype: float64


In [17]:
#Subtraction (-)
sub_series = series1 - series2
print(sub_series)


a   -3.0
b   -3.0
c    NaN
d    NaN
dtype: float64


In [18]:
#Multiplication (*)
mul_series = series1 * series2
print(mul_series)


a     4.0
b    10.0
c     NaN
d     NaN
dtype: float64


In [19]:
#Division (/)
div_series = series1 / series2
print(div_series)


a    0.25
b    0.40
c     NaN
d     NaN
dtype: float64


# Scalar Operations
* You can also perform operations between a Series and a scalar value.

In [20]:
add_scalar = series1 + 5
print(add_scalar)

sub_scalar = series1 - 2
print(sub_scalar)

mul_scalar = series1 * 10
print(mul_scalar)

div_scalar = series1 / 2
print(div_scalar)


a    6
b    7
c    8
dtype: int64
a   -1
b    0
c    1
dtype: int64
a    10
b    20
c    30
dtype: int64
a    0.5
b    1.0
c    1.5
dtype: float64


# Logical Operations
* You can also perform element-wise logical operations on Series.

* Comparison (==, !=, <, <=, >, >=)

In [24]:
series1
series2
Total_series= series1 == series2

ValueError: Can only compare identically-labeled Series objects

# Statistical Operations
* Pandas Series has built-in methods for common statistical operations.

* Descriptive Statistics

In [25]:
print(series1.mean())    # Mean
print(series1.median())  # Median
print(series1.std())     # Standard deviation
print(series1.var())     # Variance
print(series1.min())     # Minimum value
print(series1.max())     # Maximum value
print(series1.sum())     # Sum of values
print(series1.cumsum())  # Cumulative sum


2.0
2.0
1.0
1.0
1
3
6
a    1
b    3
c    6
dtype: int64
