# Getting started with Pandas

<strong>pandas</strong> is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

[read more](https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html)

In [2]:
import pandas

However, you often:

In [2]:
import numpy as np
import pandas as pd

Check Pandas version:

In [None]:
pd.__version__

Show all Pandas documentation and Librarys

In [None]:
pd.show_versions()

Find out more about a method example:

In [None]:
pd.read_csv?

## Primary data structures

The two primary data structures of pandas, [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) (1-dimensional) and [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. 




| Dimensions        | Name           | Description  |
| :------------- |:---------------| :---------|
| 1      | Series | 1D labeled homogeneously-typed array |
| 2      | DataFrame      |   General 2D labeled, size-mutable tabular <br>structure with potentially<br> heterogeneously-typed column |

### Series

Let's create a series by

In [3]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

### DataFrame

You can create a DataFrame by passing a NumPy array, with a datetime index and labeled columns:

In [3]:
dates = pd.date_range('20190101', periods=6)
dates

DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06'],
              dtype='datetime64[ns]', freq='D')

In [5]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2019-01-01,-0.91411,0.224591,0.597787,-1.192201
2019-01-02,1.460725,0.907776,-0.283707,0.614184
2019-01-03,-1.180138,-2.027088,-0.29995,0.256295
2019-01-04,1.25665,0.597683,0.040225,-1.762578
2019-01-05,-0.288866,-0.179633,-0.446499,-2.127886
2019-01-06,-0.960549,0.18298,1.249889,-0.262175


You can create a DataFrame by passing a dict of objects that can be converted to series-like.

In [22]:
df2 = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20190102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo',
                    'G': np.random.randn(4),
                    'H': pd.date_range('20190101', periods=4)})
df2

Unnamed: 0,A,B,C,D,E,F,G,H
0,1.0,2019-01-02,1.0,3,test,foo,-0.553079,2019-01-01
1,1.0,2019-01-02,1.0,3,train,foo,1.61444,2019-01-02
2,1.0,2019-01-02,1.0,3,test,foo,0.403572,2019-01-03
3,1.0,2019-01-02,1.0,3,train,foo,0.175663,2019-01-04


The columns of the resulting DataFrame have different dtypes.

In [23]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
G           float64
H    datetime64[ns]
dtype: object