# Pandas
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

`pandas` is well suited for many different kinds of data:

*   Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
*   Ordered and unordered (not necessarily fixed-frequency) time series data.
* Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
* Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

The two primary data structures of pandas, **Series** (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering.


Normally we import pandas as following

In [None]:
# Import pandas


Lets import numpy too 

In [None]:
# Import numpy


## Series

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name:

### Creating a series

In [None]:
# create list, array and dictionary 
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}

Using list

In [None]:
# Create pandas series using my_list


In [None]:
# Create pandas series using my_list and with labels as index


Using numpy Arrays

In [None]:
# Create pandas series using numpy array


In [None]:
# Create pandas series using numpy array and labels


Using dictionary

In [None]:
# Create pandas series using dictionary


Pandas can hold varities of object types

In [None]:
# Create pandas series with string objects


Using an Index

In [None]:
# Create a series with values [3,4,5] with ['2019', '2020', '2021'] as index


In [None]:
# Print the series


In [None]:
# Show the value for index '2020'


Operations on series can be based on the index

In [None]:
# Create a second series with values [1,2,3] and index ['2017', '2018', '2019']


In [None]:
# Display the second series


In [None]:
# Add the two series 


Giving a name to the series

In [None]:
# Create a series and give name to the series by passing 'Product A' as the name


## Dataframe
A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

We can think of a DataFrame as a bunch of Series objects put together to share the same index.




In [None]:
# Create a pandas dataframe with the dictionary {'Yes': [50, 20], 'No':[100,200]}
pd.DataFrame({'Yes': [50, 20], 'No':[100,200]})

In [None]:
# Create a pandas dataframe with the dictionary {'Yes': [50, 20], 'No':[100,200]} and index ['RAI 1', 'RAI 2']


We can also define dataframe as list of values and similarly add index and columns as list

In [None]:
# creating a list of random number of size (5,4)


In [None]:
# Display the random values


In [None]:
# Create the dataframe with the random_values with index ['Thailand', 'Malaysia', 'Japan', 'Laos', 'Vietnam'] and columns ['Dell', 'Samsung', 'Apple', 'HP']
df = pd.DataFrame(random_values, index=['Thailand', 'Malaysia', 'Japan', 'Laos', 'Vietnam'], columns=['Dell', 'Samsung', 'Apple', 'HP'])

In [None]:
# Display the dataframe


### Selection and Indexing

Lets learn the various methods to grad data from a DataFrame

In [None]:
# Select using indexing for Dell models only


In [None]:
# Check the type of data returned by the above code


In [None]:
# Select the columns Dell and HP


In [None]:
# Using the string with (.) operator to display the column


In [None]:
# Creating a new column


In [None]:
# Display the df


### Deleting a column

Note: If inplace is False(defaut), it will not delete the data from the original dataframe but just return the view of dataframe without the column.

In [None]:
# Delete the above created column using drop command


In [None]:
# Display the df


### Deleting a Row

In [None]:
# Delete the row Japan


In [None]:
# Display the df


### Selecting Rows

In [None]:
# Select the row Thailand


In [None]:
# Select based on position


### Conditional Selection

Similar to numpy, pandas also supports conditional selection using bracket notation

In [None]:
# Display the df first


In [None]:
# Show positions where value > 0


In [None]:
# Show the values where value > 0


In [None]:
# Display the rows where dell has values greater than 0


In [None]:
# Display the coulmns Dell and HP and rows where Dell has values greater than 0


We can use | and & with parenthesis for combining multiple conditions

In [None]:
# Display the rows where dell>0 and apple>0
