## 10 Minutes to pandas

This is a short introduction to pandas, geared mainly for new users. You can see more complex recipes in the [Cookbook](https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#cookbook).

Customarily, we import as follows:

In [1]:
import numpy as np
import pandas as pd

### Object Creation
See the [Data Structure Intro section](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dsintro).

Creating [a Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) by passing a list of values, letting pandas create a default integer index:

In [13]:
s1 = pd.Series([1, 3, 5, np.nan, 6, 8])
s2 = pd.Series(np.random.randn(6)*10)
print(s1)
print(s2)

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64
0     1.538519
1   -11.222734
2   -17.116437
3     4.888461
4    -5.895472
5    -4.080093
dtype: float64


### Create a Dataframe from Series
[Geek2Geek](https://www.geeksforgeeks.org/creating-a-dataframe-from-pandas-series/)

In [17]:
# List -> Series -> DataFrame
author = ['Jitender', 'Purnima', 'Arpit', 'Jyoti'] 
article = [210, 211, 114, 178] 

auth_series = pd.Series(author) 
article_series = pd.Series(article)

result = pd.DataFrame({ 'Author': auth_series, 'Article': article_series })
result

Unnamed: 0,Author,Article
0,Jitender,210
1,Purnima,211
2,Arpit,114
3,Jyoti,178


### Creating a [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) by passing a NumPy array
with a datetime index and labeled columns:

In [24]:
dates = pd.date_range('20130101', periods=4)
print(dates)
print(list('ABCD'))

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04'], dtype='datetime64[ns]', freq='D')
['A', 'B', 'C', 'D']


In [25]:
arr = np.random.randn(4, 2)
print(arr)
print(pd.DataFrame(arr))
print(pd.DataFrame(arr, index=dates, columns=list('AB')))

[[ 0.87920771  1.72217584]
 [-0.55548284  1.43188895]
 [ 1.97257965  1.2434183 ]
 [ 1.0438864   0.16866866]]
          0         1
0  0.879208  1.722176
1 -0.555483  1.431889
2  1.972580  1.243418
3  1.043886  0.168669
                   A         B
2013-01-01  0.879208  1.722176
2013-01-02 -0.555483  1.431889
2013-01-03  1.972580  1.243418
2013-01-04  1.043886  0.168669


### Creating a DataFrame by passing a dict of objects

that can be converted to series-like: a constant value will be duplicated, or a Series, an array or Categorical.

In [20]:
df2 = pd.DataFrame({'A': pd.date_range('20130101', periods=4),
'B': pd.Timestamp('20130102'),
'C': pd.Series(1, index=list(range(4)), dtype='float32'),
'D': np.array([3] * 4, dtype='int32'),
'E': pd.Categorical(["test", "train", "test", "train"]),
'F': 'foo',
'G': np.random.random((4))*100})
print(df2)
print(df2.shape, df2.dtypes) # The columns of the resulting DataFrame have different dtypes.

           A          B    C  D      E    F          G
0 2013-01-01 2013-01-02  1.0  3   test  foo  78.622345
1 2013-01-02 2013-01-02  1.0  3  train  foo  43.795872
2 2013-01-03 2013-01-02  1.0  3   test  foo  40.132256
3 2013-01-04 2013-01-02  1.0  3  train  foo  25.224916
(4, 7) A    datetime64[ns]
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
G           float64
dtype: object


The columns of the resulting DataFrame have different dtypes.

In [15]:
## Try typing Tab after the "." and you should see A, B, C, among other built-in attributes
df2.shape

(4, 6)

Learning continue [here](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html)