# Pandas basics with python
### Pandas is one of the most popular Python libraries for Data Science and Analytics

### Pandas helps you to manage two-dimensional data tables in Python. Of course, it has many more features. Through pandas, you get acquainted with your data by cleaning, transforming, and analyzing it. Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.


#### `Install pandas`

In [1]:
!pip install pandas

import pandas as pd

#### `Core components of Pandas : Series and DataFrame`
#### `The primary two components of pandas are the *Series* and *DataFrame*. A Series is essentially a column, and a DataFrame is a multi-dimensional table made up of a collection of Series.`

#### `A simple dataframe`

In [59]:
data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
}
fruits = pd.DataFrame(data)
print(fruits)

apples  oranges
0       3        0
1       2        3
2       0        7
3       1        2


In [61]:
# Setting the index of the fruits by location

fruits = pd.DataFrame(data, index=['Nagpur', 'Italy', 'Pune', 'India'])


***
#### `Read from CSV`

In [62]:
df = pd.read_csv('/Users/abhisheks/work/github/python-pandas-basic/data.csv')

# Setting Index to be the first Column

df = pd.read_csv('/Users/abhisheks/work/github/python-pandas-basic/data.csv', index_col=0)
df

Unnamed: 0,apples,oranges
Nagpur,2,5
Italy,3,2
Pune,2,1
India,6,1


***
#### `Read from JSON`
#### `Orient is used to set the orientation. More [Info](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html)`

In [63]:
df = pd.read_json('/Users/abhisheks/work/github/python-pandas-basic/data.json', orient="index")
df

Unnamed: 0,apples,oranges
India,1,6
Italy,1,4
Nagpur,2,3
Pune,0,5


#### `Saving dataframe back to csv or JSON`

In [8]:
df = pd.DataFrame(data)
df.to_csv('new_data.csv')
df.to_json('new_data.json')

***
#### `Important Operations`

In [65]:
# print first 5 rows, by default n = 5
df.head(n=5)

# print last 5 rows, default n = 5
df.tail()

# print random samples
df.sample(2)

# dataframe info, prints concise summary
df.info()

# variabless information
df.describe()

# dataframe structure (row, column)
df.shape

# print column names
df.columns

# rename column name
df.rename(columns={'apples': 'apple', 'oranges':'orange'}, inplace=True)
# or
df.columns = ['apple', 'orange']
df

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, India to Pune
Data columns (total 2 columns):
apple     4 non-null int64
orange    4 non-null int64
dtypes: int64(2)
memory usage: 96.0+ bytes


Unnamed: 0,apple,orange
India,1,6
Italy,1,4
Nagpur,2,3
Pune,0,5


#### `Data pre processing`

In [25]:
# dropping duplicates, inplace will modify the df itself
df.drop_duplicates(inplace=True)

# check null values and count total occurence
df.isnull().sum()

# drop rows with NaN values
df.dropna()

# drop columns by setting axis
df.dropna(axis=1)

***
#### `Data aggregation`

In [66]:
# Count number of values in each colum
df.count()

# selecting any particular column
df[['apple']].count()
# OR
df.apple.count()

# summation
df.sum()

# selecting and sum specific column
df[['apple']].sum()
# OR
df.apple.sum()

# MIN, MAX, MEAN, MEDIAN
df.apple.min()
df.apple.max()
df.apple.mean()
df.apple.median()

1.0

***
#### `Slicing, extracting and indexing`

In [68]:
# COLUMN selection

# selecting columns, returns dataframe
df[['apple', 'orange']]     # type will return pandas df

# selecting columns, if you select only one column it will return series,
# remember, data frame is collection of series
df['apple']                 # type will return pandas series
# or
df.apple

# ROW selection
# loc -> locates by name
# iloc -> locates by numerical index
df.loc['Pune']

df.iloc[1:3]

# Conditional selection
df[(df.apple > 1) | (df.orange > 4)]

Unnamed: 0,apple,orange
India,1,6
Nagpur,2,3
Pune,0,5
