# My First Pandas Walkthrough 
Date: 7/25/17
I am using following references:
* **[10 Minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/10min.html):** This is a short introduction to pandas, geared mainly for new users. You can see more complex recipes in the Cookbook



Customarily, we import the following modules:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Object Creation
Creating a **[Series](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html#pandas.Series)** (See the [Data Structure Intro Section](https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dsintro)) by passing a list of values, letting pandas create a default integer index:

**Aside:** Here, I also learned that Jupyter also allows Tab-completion and Tab-hints. For example, below I typed **pd.Ser+Tab** and it completed Series and on **np.n+Tab** it gave me a list of options to choose from which started with **np.nan**)

In [5]:
s = pd.Series([1,3,5,np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating [DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame) by passing a numpy array, with a datetime index and labeled columns:

In [8]:
dates = pd.date_range('20130101', periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [11]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,-0.01901,1.56485,-1.737703,0.57413
2013-01-02,0.763763,0.735658,-0.107898,1.454031
2013-01-03,-0.327288,-0.209822,-1.243736,0.785043
2013-01-04,1.02887,-0.210219,-0.212972,0.31818
2013-01-05,-0.525925,-1.556931,-0.729689,1.889979
2013-01-06,0.429888,0.793475,-0.393646,-0.340704


Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [43]:
df2 = pd.DataFrame({ 'A' : 1.,
...:                 'B' : pd.Timestamp('20130102'),
...:                  'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
...:               'D' : np.array([3] * 4,dtype='int32'),
                     'E' : pd.Categorical(["test","train","test","train"]),
                     'F' : 'foo' })
 

**Aside:** I learned here that a line continuation can be indicated by white space, three-dots or three-dots+anything else. But, two dots or less would give syntax error.

In [40]:
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


Having specific [dtypes](https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dtypes)

In [18]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

If you’re using IPython, tab completion for column names (as well as public attributes) is automatically enabled. Here’s a subset of the attributes that will be completed:

df2.`<TAB>`

```
df2.A                  df2.bool
df2.abs                df2.boxplot
df2.add                df2.C
df2.add_prefix         df2.clip
df2.add_suffix         df2.clip_lower
df2.align              df2.clip_upper
df2.all                df2.columns
df2.any                df2.combine
df2.append             df2.combine_first
df2.apply              df2.compound
df2.applymap           df2.consolidate
df2.as_blocks          df2.convert_objects
df2.asfreq             df2.copy
df2.as_matrix          df2.corr
df2.astype             df2.corrwith
df2.at                 df2.count
df2.at_time            df2.cov
df2.axes               df2.cummax
df2.B                  df2.cummin
df2.between_time       df2.cumprod
df2.bfill              df2.cumsum
df2.blocks             df2.D
```

As you can see, the columns A, B, C, and D are automatically tab completed. E is there as well; the rest of the attributes have been truncated for brevity.

## Viewing Data
See the [Basics section](https://pandas.pydata.org/pandas-docs/stable/basics.html#basics)

See the top & bottom rows of the frame

In [48]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,-0.01901,1.56485,-1.737703,0.57413
2013-01-02,0.763763,0.735658,-0.107898,1.454031
2013-01-03,-0.327288,-0.209822,-1.243736,0.785043
2013-01-04,1.02887,-0.210219,-0.212972,0.31818
2013-01-05,-0.525925,-1.556931,-0.729689,1.889979


In [47]:
df.tail()

Unnamed: 0,A,B,C,D
2013-01-02,0.763763,0.735658,-0.107898,1.454031
2013-01-03,-0.327288,-0.209822,-1.243736,0.785043
2013-01-04,1.02887,-0.210219,-0.212972,0.31818
2013-01-05,-0.525925,-1.556931,-0.729689,1.889979
2013-01-06,0.429888,0.793475,-0.393646,-0.340704


# Finally
KP: Here I am just trying to find out if the [scikit-learn](http://scikit-learn.org/stable/) module is recognized and no error is thrown. I plan to try it in more detail later in another notebook.
## scikit-learn
Machine Learning in Python
* Simple and efficient tools for data mining and data analysis
* Accessible to everybody, and reusable in various contexts
* Built on NumPy, SciPy, and matplotlib
* Open source, commercially usable - BSD license

In [15]:
import sklearn