# Pandas
Library for data analysis

## Pandas DataFrames
Tabular datastructure with labeled rows and columns

Rows: Labeled by special data structure called index.

### Index :
Tabled list of labels that permit fast lookup and some relational operations.

Index labels in Apple dataframe = dates in reverse chronological order.

![dataframe](Images/dataframe.png)

## Working with DataFrame in memory

![indexes](Images/indexes.png)

Notice that AAPL.columns is also pandas index.

![datetimeindex](Images/datetimeindex.png)

The AAPL.index attribute in this case is of special kind - called DatetimeIndex

Dataframes can be sliced like Numpy arrays or python lists:
![slicing](Images/slicing.png)

### head() method
Another way to see the first few rows of data:
![head](Images/head.png)

### tail() method
Accessing the last 5 rows:
![tail](Images/tail.png)

### info()
Useful summary for large dataframes
![info](Images/info.png)

## Series
- Columns of a Dataframe are themselves a specialized Pandas structure called a Series.
- Extracting a single column from a DataFrame returns a Series.
- The Series extracted has its own head method and inherits its name attribute from Dataframe column.

### values attribute:
- To extract numerical entities from Series
- Yields a Numpy array

![series](Images/series.png)

Pandas Series = 1D labelled Numpy array
Pandas Dataframe = 2D labelled array whose columns are Series
![series and dataframe](Images/series-and-dataframe.png)

In [1]:
import pandas as pd
type(AAPL)

TypeError: type() takes 1 or 3 arguments

In [None]:
AAPL.shape

In [2]:
AAPL.columns

NameError: name 'AAPL' is not defined

In [None]:
type(AAPL.columns)

## Building DataFrames from scratch

### DataFrames from CSV files
![csv_dataframes](Images/csv_dataframes.png)

## Creating DataFrames from Dictionary

### Method 1

In [13]:
import pandas as pd
# keys of dictionary data are used as column labels
data = {'weekday' : ['Sun', 'Sun', 'Mon', 'Mon'],
        'city' : [' Austin', ' Dallas', ' Austin', ' Dallas'],
        'visitors' : [139,237,326,456],
        'signups' : [7,12,3,5]}
users = pd.DataFrame(data)
print(users)

  weekday     city  visitors  signups
0     Sun   Austin       139        7
1     Sun   Dallas       237       12
2     Mon   Austin       326        3
3     Mon   Dallas       456        5


With no index specified, the row labels are integers 0 to 3 by default

### Method 2

In [16]:
cities = [' Austin', ' Dallas', ' Austin', ' Dallas']
signups = [7,12,3,5]
visitors = [139,237,326,456]
weekdays = ['Sun', 'Sun', 'Mon', 'Mon']
list_labels = ['city', 'signups', 'visitors', 'weekday']
list_cols = [cities, signups, visitors, weekdays]  # A list of lists
zipped = list(zip(list_labels,list_cols))

In [17]:
data = dict(zipped)
users = pd.DataFrame(data)
users

Unnamed: 0,city,signups,visitors,weekday
0,Austin,7,139,Sun
1,Dallas,12,237,Sun
2,Austin,3,326,Mon
3,Dallas,5,456,Mon


## Broadcasting

In [20]:
users['fees'] = 0 #Broadcasts value to entire column
users

Unnamed: 0,city,signups,visitors,weekday,fees
0,Austin,7,139,Sun,0
1,Dallas,12,237,Sun,0
2,Austin,3,326,Mon,0
3,Dallas,5,456,Mon,0


## Relabeling