<img src="PANDAS.png" width=1500 height=500 />


For reference follow the Pandas documentation at [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html)

# Data analysis with python - **Pandas**

- Pandas is a Python library that provides data structures and data analysis tools for handling and manipulating numerical tables and time series data.
-  It is built on top of the popular data manipulation library, numpy, and is widely used for data preparation and wrangling tasks in data science and machine learning workflows. 
- Some of the key features of pandas include its fast and efficient handling of large datasets, powerful data manipulation and cleaning capabilities, and support for a wide range of file formats and data sources.

## Main Features of PANDAS

The main features of PANDAS library

- **Easy handling of missing data:** Easy handling of missing data (represented as `NaN`, `NA`, or `NaT`) in floating point as well as non-floating point data 
- **Size mutability:** columns can be inserted and deleted from DataFrame and higher dimensional objects
- **Automatic and explicit data alignment:** objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let `Series`, `DataFrame`, etc. automatically align the data for you in computations
- **Groupby:** Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
- **Data conversion:** Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
- **Data manipulation:** 
    - Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
    - Intuitive merging and joining data sets
    - Flexible reshaping and pivoting of data sets
    - Hierarchical labeling of axes (possible to have multiple labels per tick)
- **Type of data handling:** Robust IO tools for loading data from flat files (`CSV` and `delimited`), Excel files, databases, and saving/loading data from the ultrafast `HDF5` format
- **Time series-specific functionality:** date range generation and frequency conversion, moving window statistics, date shifting and lagging.



![Python](https://img.shields.io/badge/python-3670A0?style=flat&logo=python&logoColor=ffdd54) ![Anaconda](https://img.shields.io/badge/Anaconda-%2344A833.svg?style=flat&logo=anaconda&logoColor=white) 
![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=flat&logo=numpy&logoColor=white) ![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=flat&logo=pandas&logoColor=white)

## NumPy vs Pandas

<table><tr>
<td> <img src="Numpy-1.png" alt="Drawing" style="width: 650px;"/> </td>
<td> <img src="Pandas-1.png" alt="Drawing" style="width: 650px;"/> </td>
</tr></table>

[Image refeerence](https://favtutor.com/blogs/numpy-vs-pandas)

| **Comparison Parameter** |  **NumPy** | **Pandas** |
|----------------------|--------|--------|
| **Powerful Tool** | _A powerful tool of NumPy is Arrays_ | _A powerful tool of Pandas is Data frames and a Series_ |
| **Memory Consumption** | _NumPy is memory efficient_ | _Pandas consume more memory_ |
| **Data Compatibility** | _Works with numerical data_ | _Works with tabular data_ |
| **Performance** | _Better performance when the number of rows is 50K or less_ | _Better performance when the number of rows is 500k or more_ |
| **Speed** | _Faster than data frames_ | _Relatively slower than arrays_ |
| **Data Object** | _Creates “N” dimensional objects_ | _Creates “2D” objects_ |
| **Type of Data** | _Homogenous data type_ | _Heterogenous data type_ |
| **Access Methods** | _Using only index position_ | _Using index position or index labels_ |
| **Indexing** | _Indexing in NumPy arrays is very fast_ | _Indexing in Pandas series is very slow_ |
| **Operations** | _Does not have any additional functions_ | _Provides special utilities such as “groupby” to access and manipulate subsets_ |
| **External Data** | _Generally used data created by the user or built-in function_ | _Pandas object created by external data such as CSV, Excel, or SQL_ |
| **Application** | _NumPy is popular for numerical calculations_ | _Pandas is popular for data analysis and visualizations_ |
| **Usage in ML and AI** | _Toolkits can like TensorFlow and scikit can only be fed using NumPy arrays_ | _Pandas series cannot be directly fed as input toolkits_ |
| **Core Language** | _NumPy was written in C programming initially_ | _Pandas use R language for reference language_ |


Python libraries like NumPy and Pandas are often used together for data manipulations and numerical operations.

For more details on Numpy library, pleasee follow the Numpy notebook on my repository on Github: [Numy notebook on Github repository](https://github.com/arunsinp/Python-programming/blob/main/Python-fundamental/Numpy-tutorial.ipynb)

In [2]:
# To import Pandas library to your notebook
import pandas as pd
# For most times, numpy should also be imported to the notebook
import numpy as np

## 1. Input/output



# References

1. https://pandas.pydata.org/docs/user_guide/index.html#user-guide
2. https://pandas.pydata.org/docs/