## INSTALLATION AND LOADING OF PANDAS
**Tips**
- To install Pandas, need to have NumPy already installed!
- For Pandas help: *pd?*


In [12]:
import pandas as pd # Typically import under the alias pd

## DATA STRUCTURES IN PANDAS: OVERVIEW
- Data structures in Pandas are analogous to NumPy arrays, only in Pandas we will identify rows and columns with labels instead of the solely integer indices used in NumPy
  - Also have some analogies to Python dictionaries!
<br><br>
- Label indices are not limited to being integers - strings are common and useful labels!
<br><br>
- Two data structures: 
  - Series
  - Dataframe
<br><br>
- The data structures have special attributes and methods!
<br><br>
- We'll often read from files into data structures, but you can also create from scratch - using a NumPy array or otherwise!
  
## PANDAS SERIES: OVERVIEW

**Definition**: a 1D array of data that is *explicitly* indexed
  - analogous in many ways to a 1D NumPy array with flexible indices

### (1) SYNTAX
- *create from scratch*: pd.Series(data,index)
<br><br>
- *read in from file*:  pd.read_csv('filename')
<br><br>

In [13]:
# Let's create a simple series 

# High temperature on a summer day at a few locations 
            # Define the values first   # Then the named indices (the data labels)
                                        # We used strings as indices!  This is permitted!
data = pd.Series([85,60,89,80],index=['Champaign','Anchorage','Miami','Los Angeles']) # in deg F
print(data)

Champaign      85
Anchorage      60
Miami          89
Los Angeles    80
dtype: int64


### (2) CHARACTERISTICS
- data.values = values of the Series
- data.index = indices of the Series 

In [14]:
data.values 

array([85, 60, 89, 80], dtype=int64)

In [15]:
data.index #  We used STRINGS as indices!  Does not have to be integer! #power 

Index(['Champaign', 'Anchorage', 'Miami', 'Los Angeles'], dtype='object')

## PANDAS DATAFRAME: OVERVIEW
- **Definition**: a 2D array of data that is *explicitly* indexed
    - analogous in many ways to a 2D NumPy array with flexible row and column indices 
<br><br>
- Link to Series:  a DataFrame is a sequence of "aligned" Series -- sharing the same index!


### (1) SYNTAX

- *creation from scratch*: pd.DataFrame(data,index)
<br><br>
- *reading in from file*:  pd.read_csv('filename')

In [16]:
# Let's create another series for these cities
# This time for dewpoint
data2 = pd.Series([68,47,70,60],index=['Champaign','Anchorage','Miami','Los Angeles']) # note the SHARED index 

In [17]:
# Let's put these two Series together to create a DataFrame!
#  Look familiar?  There are dictionary analogies for these data structures, too!
tdata = pd.DataFrame({'temperature': data,'dewpoint': data2})
print(tdata)
print(type(tdata))

             temperature  dewpoint
Champaign             85        68
Anchorage             60        47
Miami                 89        70
Los Angeles           80        60
<class 'pandas.core.frame.DataFrame'>


In [18]:
print(type(tdata['temperature'])) # proving to ourselves that a column of a DataFrame is a Series!

<class 'pandas.core.series.Series'>


### (2) CHARACTERISTICS
- data.values - the values of the DataFrame as a ~Numpy array
- data.index - the indices of the DataFrame
- data.columns - the labels of the columns

In [19]:
tdata.values # the values of the elements themselves

array([[85, 68],
       [60, 47],
       [89, 70],
       [80, 60]], dtype=int64)

In [20]:
tdata['temperature'].values # values of the elements in a specific column

array([85, 60, 89, 80], dtype=int64)

In [21]:
tdata.index # the indices, aka the city names

Index(['Champaign', 'Anchorage', 'Miami', 'Los Angeles'], dtype='object')

In [22]:
tdata.columns # the column labels

Index(['temperature', 'dewpoint'], dtype='object')

**IMPORTANT NOTE**: <br>
- use of df['column_name'] is recommended over df.column_name
  - can sometimes be used interchangeably, but not *always*
    - example:  if the column name has spaces, latter will not work