# Data Frame Introduction

- **DataFrame** is a 2-dimensional labeled data structure (labelled rows, and labelled columns (variable names)).


- Each column can be of a**different type (numeric, string, boolean, ...)**. 


- A DataFrame represents a tabular, spreadsheet-like structure containing an ordered collection of columns. 


- You can think of it like an excel **spreadsheet** or **SQL table**, or a **dict of Series objects**.


- It is like __data.frame__ R language object.

- It is generally the most commonly used pandas object

### Setting up the workspace

In [1]:
import pandas as pd
import numpy as np
from random import sample, choices, seed
from numpy.random import randn

#### DataFrame Exmple

In [2]:
# Creating index values
ind = ["R" + "_" + str(num) for num in range(1, 16)]

# Creating column names (Variables)
cols = ["VAR" + "_" + str(num) for num in range(1, 7)]

# Generating Random Data
my_data = randn(90).reshape(15, 6)

In [3]:
df = pd.DataFrame(data = my_data, index = ind, columns= cols)
df.head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132


## Common DataFrames Attributes

### Columns Attribute

__columns__ attributes returns the columns (variables) names

In [4]:
df.columns

Index(['VAR_1', 'VAR_2', 'VAR_3', 'VAR_4', 'VAR_5', 'VAR_6'], dtype='object')

### Values Attributes

  __values__ attributes returns the data contained in the DataFrame as a 2D ndarray.

In [5]:
df.head().values

array([[ 1.53136069, -0.05333493,  2.30732677, -0.57195175, -0.2796606 ,
        -1.01531148],
       [ 2.06949056,  0.47464295, -0.32122711,  0.91052592, -1.18124613,
         0.19248116],
       [ 1.14033473,  0.2021169 , -1.35598253,  1.56349423, -1.10969788,
        -1.05879743],
       [ 0.07748071, -1.08973559, -1.07290505, -0.76483992,  1.18993579,
         0.26669432],
       [ 1.10610738, -0.82256722,  0.44373433, -1.12032906, -0.20953513,
        -0.33613215]])

### Shape Attribute

  __shape attribute__ returns a tuple of the number of rows and number of columns.

In [6]:
df.shape

(15, 6)

---
# Selection (Slicing), Creation and Deletion

   - Selecting, Creating and Deleting (dropping) is a common task when dealing with data. Each process will be discussed seperately in this tutorial.
   

## Selection (Slicing)

### Variable Selection

- **String (label) indexing**: Passing the variable name to the DataFrame object will retrieve a specific variable (**It is called a dict-like notation**). The syntax is shown here.

```python

   DataFrame['column-name']
    
```
  
#### Note: 

- When selecting one variable, the data frame becomes a **series**. This means that a DataFrame object is a set of series objects.

In [7]:
# Select The First variable
var_1 = df['VAR_1']
var_1

R_1     1.531361
R_2     2.069491
R_3     1.140335
R_4     0.077481
R_5     1.106107
R_6    -0.195087
R_7     0.529832
R_8    -1.094711
R_9     0.755984
R_10   -0.427899
R_11    0.935375
R_12    0.067525
R_13   -1.790720
R_14   -0.111392
R_15   -0.606929
Name: VAR_1, dtype: float64

In [8]:
# Examine the type 
type(var_1)

pandas.core.series.Series

  indeed, __var_1__ is a pandas Series object with the same index as the DataFrame. And, the name has appropriately set.

In [9]:
var_1.name

'VAR_1'

### Remark: 

  - As mentioned in the definition, a DataFrame is just a set of series combined togther, (At least two series)
  
  
  - If we have two series, we can construct a DataFrame. 
  
  
  - We do that in the next example

In [10]:
var_2 = df['VAR_2']

> We extracted two series, now we join them together to form a new data frame by providing a dict of series.

In [11]:
df2 = pd.DataFrame(data = {'VAR_1': var_1, 'VAR_2':var_2})

In [12]:
df2.head()

Unnamed: 0,VAR_1,VAR_2
R_1,1.531361,-0.053335
R_2,2.069491,0.474643
R_3,1.140335,0.202117
R_4,0.077481,-1.089736
R_5,1.106107,-0.822567


In [13]:
# Check the type of df2
type(df2)

pandas.core.frame.DataFrame

### Accessing more than one variable

  - If we want to select two or more variables from a data frame, we use a double square brackets ([[...]]); in other words, we pass a list of a list of variables.
  
Here is the syntax:
```python
   df[['var1', 'var2', ...]]
```

In [14]:
df[['VAR_1', 'VAR_3']].head() 

Unnamed: 0,VAR_1,VAR_3
R_1,1.531361,2.307327
R_2,2.069491,-0.321227
R_3,1.140335,-1.355983
R_4,0.077481,-1.072905
R_5,1.106107,0.443734


> I am chaining the __head()__ method to print only the first few observations. 

### Accessing the DataFrame Variable Using the Dot Operator

   - With a DataFrame, we can access a variable using the __dot operator__ (otherwise called **attribute**), just like accessing  a method. 
   
   
   - This might be confusing at the first time, and some python programmers do not recommend using it. However, it is worth mentioning though, and we should know it.
   
### Note: 

  - Using attribute notation to select a variable works only with __valid python variables names__, (This does not workd if a variable contains a space)

In [15]:
df.VAR_1.head()

R_1    1.531361
R_2    2.069491
R_3    1.140335
R_4    0.077481
R_5    1.106107
Name: VAR_1, dtype: float64

### Rows Selection


  -**Row Selection (slicing)** in pandas DataFrame can be done using two methods
  
#### 1. String or label-index:

  - **The _loc_ method**: a **label-index or (a string index)** is passed to the __loc method__ in order to select a row.  
  
Here is the syntax
  
```python
  df.loc['string-index']
```

#### Selecting first row labelled as 'R_1'

In [16]:
first_row = df.loc['R_1']
first_row

VAR_1    1.531361
VAR_2   -0.053335
VAR_3    2.307327
VAR_4   -0.571952
VAR_5   -0.279661
VAR_6   -1.015311
Name: R_1, dtype: float64

#### Examine the type

In [17]:
type(first_row)

pandas.core.series.Series

### Note:

   - Selecting one row is also a pandas Series object. Thus, slicing one element __horizontally or vertically__ results in a __series object__.

In [18]:
# Select another row
# df['R_6']          # This will not work, because pandas thinks it is a variable
df.loc['R_6']

VAR_1   -0.195087
VAR_2   -1.194457
VAR_3   -2.270838
VAR_4   -0.100887
VAR_5    1.213398
VAR_6   -3.339269
Name: R_6, dtype: float64

#### 2. Integer-index Selection

 - **The _iloc method_**: This is the second way for slicing a DataFrame, by passing the **integer-index (position)** to the **iloc method**. 

Here is the syntax.
 
```python
df.iloc[int-index]
```

##### Selecting the first row (position is zero)

In [19]:
df.iloc[0]

VAR_1    1.531361
VAR_2   -0.053335
VAR_3    2.307327
VAR_4   -0.571952
VAR_5   -0.279661
VAR_6   -1.015311
Name: R_1, dtype: float64

In [20]:
# Select another row
#df[5]               # This won't work as well
df.iloc[5]

VAR_1   -0.195087
VAR_2   -1.194457
VAR_3   -2.270838
VAR_4   -0.100887
VAR_5    1.213398
VAR_6   -3.339269
Name: R_6, dtype: float64

### Multiple Row Selection

  - Selecting more than one row (case, observation) is done using double square bracket ([[...]]); in python parlance, a list of a list of indexes, and pass that to either __loc method__ if label-index, or __iloc method__ if integer-index. 
  
  
  - In other words, **pass a list to either method**
  
```python
1. loc method
     df.loc[['Row1', 'Row2', '...']]
    
2. iloc method

    df.iloc[[1, 2, 4, ...]]
```

#### Select the first and the third Row

In [21]:
df.loc[['R_1', 'R_3']].head(3)

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797


> We can have the same result using iloc as follows

In [22]:
df.iloc[[0, 2]].head(3)

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797


> **Pandas Slicing is AWESOME isn't it!** ðŸ‘ŠðŸ‘ŠðŸ‘Š
---

### Row-Column Selection (Extremely Importand Section)

- After learning how to slice columns and rows, combining between them will result in __row-column selection__. Here is the syntax:

```python

1. One Value 

  df.loc['R_1', 'VAR_1'] or df.iloc[0, 0]
    
2. One row multiple columns: we pass a list of  an index and a list of variable names.

   df.loc['R_1', ['VAR_1', 'VAR_3', ...]] 
    
3. Multiple rows- One column: a list of a list of indexes and a variable name.
    
    df.loc[['R_1', 'R_3', ...], 'VAR_1'] 
    
4. Multiple rows - Multiple Columns: a list of two lists (the first is the list of indexes, and the second in the list of variables.   
    
    df.loc[['R_1', 'R_3', ...], ['VAR_1', 'VAR_3', ...]]
```

We can use __iloc__ if we know the position of the row index. I think __loc__ is more convenient.  

> ##### 1. One-Value

In [23]:
df.loc['R_1', 'VAR_1'] 

1.5313606938234037

In [24]:
# Or
df.iloc[0, 0]

1.5313606938234037

> ##### 2. One row-Multiple columns

In [25]:
df.loc['R_1', ['VAR_1', 'VAR_3', 'VAR_6']] 

VAR_1    1.531361
VAR_3    2.307327
VAR_6   -1.015311
Name: R_1, dtype: float64

> Note: Selecting one row results in a Series object

In [26]:
type(df.loc['R_1', ['VAR_1', 'VAR_3', 'VAR_6']])

pandas.core.series.Series

> ##### 3. Multiple rows- One column

In [27]:
df.loc[['R_1', 'R_3', 'R_5'], 'VAR_1'] 

R_1    1.531361
R_3    1.140335
R_5    1.106107
Name: VAR_1, dtype: float64

> Note: Selecting one variable results in a Series object as well. 

In [28]:
type(df.loc[['R_1', 'R_3', 'R_5'], 'VAR_1'])

pandas.core.series.Series

> ##### 4. Multiple rows- Multiple columns

In [29]:
df.loc[['R_1', 'R_3'], ['VAR_1', 'VAR_3']]

Unnamed: 0,VAR_1,VAR_3
R_1,1.531361,2.307327
R_3,1.140335,-1.355983


In [30]:
df.loc[['R_1', 'R_3', 'R_5'], ['VAR_1', 'VAR_3', 'VAR_6']]

Unnamed: 0,VAR_1,VAR_3,VAR_6
R_1,1.531361,2.307327,-1.015311
R_3,1.140335,-1.355983,-1.058797
R_5,1.106107,0.443734,-0.336132


> Selecting multiple rows and multiple columns returns a DataFrame. 

---
## Creation

### Adding New Variables 

   - Creating a new variable is straightforward. We just have to pass the new vriable name as if it already exists, then give it new values. The syntax is as follows:
   
```python
df['new_var'] = new_values
```

In [31]:
df['NEW_VAR'] = df['VAR_1'] + df['VAR_2']
df.head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6,NEW_VAR
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311,1.478026
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481,2.544134
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797,1.342452
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694,-1.012255
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132,0.28354


### Adding Empty Variable

   - If we pass a column that isn't contained in data, a Series of __NaN__ will show up. 


In [32]:
df2 = pd.DataFrame(data = df,  index = ind,
        columns = ['VAR_1', 'VAR_2', 'VAR_3', 'VAR_4', 'VAR_5', 'VAR_6', 'EmptyVar'])
df2.head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6,EmptyVar
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311,
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481,
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797,
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694,
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132,


### Assigning values to Variables

   - Assigning new values to a variable is done using a dict-notation. We pass a value or list of values to the specified column. 

In [33]:
df2['EmptyVar'] = 19.8
df2.head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6,EmptyVar
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311,19.8
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481,19.8
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797,19.8
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694,19.8
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132,19.8


> The same value appears on all rows. The value broadcasts on all indexes.

#### We Can Change values as well by indexing a specified column

In [34]:
df2.loc['R_1':'R_5',['EmptyVar']] = 23.2
df2.head(7)

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6,EmptyVar
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311,23.2
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481,23.2
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797,23.2
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694,23.2
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132,23.2
R_6,-0.195087,-1.194457,-2.270838,-0.100887,1.213398,-3.339269,19.8
R_7,0.529832,-1.026545,0.007433,-1.367197,-1.004242,-1.774208,19.8


You may notice that I am using ':' between rows, this is because the index labels are in sequence. 

- We can even do that with variables if they are in sequence. Nice and short!!!.
---

In [35]:
df2.loc['R_1':'R_5', 'VAR_1':'VAR_4'].head() 

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4
R_1,1.531361,-0.053335,2.307327,-0.571952
R_2,2.069491,0.474643,-0.321227,0.910526
R_3,1.140335,0.202117,-1.355983,1.563494
R_4,0.077481,-1.089736,-1.072905,-0.76484
R_5,1.106107,-0.822567,0.443734,-1.120329


### Assigning an array-like to an empty column

If we want to assign a list or a Numpy array to a variable, the lenght must match the length of the DataFrame. 

In [36]:
df2['EmptyVar'] = np.arange(1, 16)
df2.head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6,EmptyVar
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311,1
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481,2
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797,3
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694,4
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132,5


In [37]:
df2.tail()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6,EmptyVar
R_11,0.935375,0.501264,0.694028,-0.817003,-0.298779,-0.75909,11
R_12,0.067525,0.511827,-0.667459,-1.568217,1.091447,-0.388848,12
R_13,-1.79072,1.534121,0.306633,0.013719,-0.907319,-0.318479,13
R_14,-0.111392,-0.683768,-2.069643,0.544413,0.349911,0.758114,14
R_15,-0.606929,1.580305,-0.733967,-0.54592,0.310681,-0.662977,15


---
## Dropping

###  Dropping Variables

  - Dropping a variable can be done using the pandas DataFrame __drop method__.
  
  
  - **Drop method refers to the index not columns (the axis argument is set to zero by default (axis =0))**
  
  
  - **Dropping variables** is achieved by __setting the axis=1__ 
  
  
  - Pandas handles dropping variables carefully:
  
     - it protects us from deleting variables accidently (because losing data is expensive). 
     
     - Therefore, there is an argument called __inplace__, this to confirm whether we want to delete the variable permanently.
  
          - Setting __inplace = False__ (which is the default) will not delete the variable from the original data (it makes a copy of the data). 
     
          - Setting __inplace = True__ will delete (drop) the variable permanently.  

#### Example: Dropping variables with inplace = True

In [38]:
# Set axis = 1
# df.drop('NEW_VAR', axis = 1, inplace = False)
df.drop('NEW_VAR', axis = 1, inplace = False).head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132


####  Check the original data frame

In [39]:
df.head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6,NEW_VAR
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311,1.478026
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481,2.544134
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797,1.342452
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694,-1.012255
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132,0.28354


 - Indeed, the variable is still there. And, if we want to delete the variabel permanently, we should set inplace to True.

In [40]:
df.drop('NEW_VAR', axis = 1, inplace = True)

In [41]:
df.head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132


**The variable permanently removed**

### Dropping Multiple Variables

  - This is done by providing a list of variable to the **drop** method 
  
```python

  df.drop[['V_1', 'V_2', ...]]
```


In [42]:
df.drop(['VAR_1', 'VAR_3', 'VAR_6'], axis = 1).head(3)

Unnamed: 0,VAR_2,VAR_4,VAR_5
R_1,-0.053335,-0.571952,-0.279661
R_2,0.474643,0.910526,-1.181246
R_3,0.202117,1.563494,-1.109698


### Droping Rows (Observations)

  - The drop method drops rows from the data frame by default
  
  
  - Passing a label-index (or integer-index) to drop one row
  
  
  - Passing a list of lable-index (or integer-index) will drop multiple rows
  
  
  - Setting __inplace = True__ will remove observations permanently. 

#### Dropping One Row

In [43]:
df.drop('R_1', axis = 0).head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132
R_6,-0.195087,-1.194457,-2.270838,-0.100887,1.213398,-3.339269


In [44]:
df.head()

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6
R_1,1.531361,-0.053335,2.307327,-0.571952,-0.279661,-1.015311
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481
R_3,1.140335,0.202117,-1.355983,1.563494,-1.109698,-1.058797
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694
R_5,1.106107,-0.822567,0.443734,-1.120329,-0.209535,-0.336132


**The first is still there, because we didn't specify inplace = True**

we can do that like this
 
```python
df.drop('R_1', inplace = True)
```

#### Dropping Multiple Rows

  - Passing a list of row indexes to the drop method

```python

    df.drop(['R1', 'R2', ...])
```

In [45]:
df.drop(['R_1', 'R_3', 'R_5', 'R_14', 'R_15'])

Unnamed: 0,VAR_1,VAR_2,VAR_3,VAR_4,VAR_5,VAR_6
R_2,2.069491,0.474643,-0.321227,0.910526,-1.181246,0.192481
R_4,0.077481,-1.089736,-1.072905,-0.76484,1.189936,0.266694
R_6,-0.195087,-1.194457,-2.270838,-0.100887,1.213398,-3.339269
R_7,0.529832,-1.026545,0.007433,-1.367197,-1.004242,-1.774208
R_8,-1.094711,-0.101048,0.337866,-1.765317,-0.236856,-0.876982
R_9,0.755984,0.189763,0.717457,0.78138,-0.566599,-1.407496
R_10,-0.427899,-0.861722,1.311468,0.234808,-0.470976,1.544654
R_11,0.935375,0.501264,0.694028,-0.817003,-0.298779,-0.75909
R_12,0.067525,0.511827,-0.667459,-1.568217,1.091447,-0.388848
R_13,-1.79072,1.534121,0.306633,0.013719,-0.907319,-0.318479


---


# Understanding how Data Frame Are Constructed Internally

  - There are several ways to contruct a DataFrame. such as:
      - Dictionaries (the most common way)
      - NumPy nd-Arrays
      - dataclass (It is not discussed here)

## Constructing DataFrames form Dicts

In [46]:
data = {'Var1': [*range(1, 6)], 
       'Var2': [*range(2, 11, 2)], 
       'Var3': [*range(10, 51, 10)]}
data

{'Var1': [1, 2, 3, 4, 5],
 'Var2': [2, 4, 6, 8, 10],
 'Var3': [10, 20, 30, 40, 50]}

In [47]:
dict_df = pd.DataFrame(data)
dict_df

Unnamed: 0,Var1,Var2,Var3
0,1,2,10
1,2,4,20
2,3,6,30
3,4,8,40
4,5,10,50


#### Example Two: Constructing DataFrames from Dicts

In [48]:
dat = {'states': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 
      'year': [2000, 2001, 2002, 2001, 2002], 
      'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
df = pd.DataFrame(dat)
df

Unnamed: 0,states,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


### Rearrange the columns names

Passing a list of variable names to __columns__ argument, the columns will appear exactly in order provided.
  

In [49]:
pd.DataFrame(dat, columns = ['year', 'states', 'pop'])

Unnamed: 0,year,states,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9


#### Adding row label-index

Passing a list of string labels to the __index__ argument will label the data frame rows.

In [50]:
pd.DataFrame(dat, columns = ['year', 'states', 'pop'],
            index = ['A', 'B', 'C', 'D', 'E'])

Unnamed: 0,year,states,pop
A,2000,Ohio,1.5
B,2001,Ohio,1.7
C,2002,Ohio,3.6
D,2001,Nevada,2.4
E,2002,Nevada,2.9


## Constructing DataFrames form Dict of Dicts

   - If a dict of dicts is passed to DataFrame, the __outer dict keys__ will be interpreted as __variables__, and the __inner dict kyes__ as the row indices.

In [51]:
grades = {'Ahmad': {'Math': 13, 'Stats': 17.5, 'Science': 0}, 
         'Nabil': {'Math': 5.5, 'Computing': 17.75, 'stats': 18.5}, 
         'Islam': {'Math': 7.5, 'Stats': 12.5, 'Algo': 18}}
grades

{'Ahmad': {'Math': 13, 'Stats': 17.5, 'Science': 0},
 'Nabil': {'Math': 5.5, 'Computing': 17.75, 'stats': 18.5},
 'Islam': {'Math': 7.5, 'Stats': 12.5, 'Algo': 18}}

In [52]:
df_grades = pd.DataFrame(grades)
df_grades

Unnamed: 0,Ahmad,Nabil,Islam
Math,13.0,5.5,7.5
Stats,17.5,,12.5
Science,0.0,,
Computing,,17.75,
stats,,18.5,
Algo,,,18.0


> We see where there is no value, an __NaN__ appears in the DataFrame. 

### Tranposing the DataFrame.

  - We can always shift the rows to columns and the columns to rows by using __.T__  DataFrame **ATTribute**.

In [53]:
df_grades.T

Unnamed: 0,Math,Stats,Science,Computing,stats,Algo
Ahmad,13.0,17.5,0.0,,,
Nabil,5.5,,,17.75,18.5,
Islam,7.5,12.5,,,,18.0


#### Index name, Columns names

  - Data about data is of much help for us to understand the data. Thus, we can provide a name for the index and for columns.

### Getting the index

In [54]:
df_grades.index

Index(['Math', 'Stats', 'Science', 'Computing', 'stats', 'Algo'], dtype='object')

### Getting the index name

In [55]:
df_grades.index.name

### Setting a name to the index

In [56]:
df_grades.index.name = 'Subjects'

### Check the index name

In [57]:
df_grades.index.name

'Subjects'

### Getting the column names

In [58]:
df_grades.columns

Index(['Ahmad', 'Nabil', 'Islam'], dtype='object')

### Setting a name attribute to columns 

In [59]:
df_grades.columns.name = 'Students'

In [60]:
df_grades.columns.name

'Students'

In [61]:
df_grades

Students,Ahmad,Nabil,Islam
Subjects,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Math,13.0,5.5,7.5
Stats,17.5,,12.5
Science,0.0,,
Computing,,17.75,
stats,,18.5,
Algo,,,18.0


## Constructing DataFrames form 2D-arrays



In [62]:
d = np.array([np.arange(5).T, np.arange(5, 26, 5)]).T
d

array([[ 0,  5],
       [ 1, 10],
       [ 2, 15],
       [ 3, 20],
       [ 4, 25]])

In [63]:
pd.DataFrame(d, index = [*range(1, 6)],  columns = ['First', 'Second'])

Unnamed: 0,First,Second
1,0,5
2,1,10
3,2,15
4,3,20
5,4,25


#### Note: 

  - There are other possibile data inputs to construct DataFrames from. consult the online documentation or a speciliazed book. 
  
[__Python for data analysis__](https://www.oreilly.com/library/view/python-for-data/9781491957653/) by Wesly Mckinney is a good book in this matter. 