In [1]:
import pandas as pd

Pandas DataFrames are two-dimensional data structures with labeled rows and columns, that can hold many data types.

# Creating DataFrames Manually
These are some ways to do so:
* Create a DataFrame from a dictionary of Serises.
* Create a DataFrame from a dictionary of lists.
* Create a DataFrame from a list of dictionaries.


### Create a DataFrame From a Dictionary of Serises. 

In [2]:
# We create a dictionary of Pandas Series 
items = {'Bob' : pd.Series(data = [245, 25, 55], index = ['bike', 'pants', 'watch']),
         'Alice' : pd.Series(data = [40, 110, 500, 45], index = ['book', 'glasses', 'bike', 'pants'])}

# We print the type of items to see that it is a dictionary
print(type(items))

<class 'dict'>


In [4]:
# We create a Pandas DataFrame by passing it a dictionary of Pandas Series
shopping_carts = pd.DataFrame(items)
shopping_carts

Unnamed: 0,Bob,Alice
bike,245.0,500.0
book,,40.0
glasses,,110.0
pants,25.0,45.0
watch,55.0,


**Notes:**
* The DataFrame result columns are constructed from the dictionary keys
* The DataFrame result rows are constructed from the union of the serieses labels.
 
* NaN stands for "Not a Number" and it's the way pandas says that there is no value for this cell in the table.\
 **Explanation:** for example the row "watch" and column "Alice" value is NaN because alice do not have a watch lable, this label is contained only in Bob's series.

In [5]:
# We create a dictionary of Pandas Series without indexes
data = {'Bob' : pd.Series([245, 25, 55]),
        'Alice' : pd.Series([40, 110, 500, 45])}

df = pd.DataFrame(data)
df

Unnamed: 0,Bob,Alice
0,245.0,40
1,25.0,110
2,55.0,500
3,,45


### Create a DataFrame From a Dictionary of Lists. 
**Important Note:** the length of all lists must be equal.

In [8]:
# We create a dictionary of lists (arrays)
data = {'Integers' : [1,2,3],
        'Floats' : [4.5, 8.2, 9.6]}

# We create a DataFrame 
df = pd.DataFrame(data)

# We display the DataFrame
df

Unnamed: 0,Integers,Floats
0,1,4.5
1,2,8.2
2,3,9.6


Notice that since the data dictionary we created doesn't have label indices, Pandas automatically uses numerical row indexes when it creates the DataFrame.

We can, however, put labels to the row index by using the index keyword in the pd.DataFrame() function.

In [9]:
# We create a dictionary of lists (arrays)
data = {'Integers' : [1,2,3],
        'Floats' : [4.5, 8.2, 9.6]}

# We create a DataFrame and provide the row index
df = pd.DataFrame(data, index = ['label 1', 'label 2', 'label 3'])
df

Unnamed: 0,Integers,Floats
label 1,1,4.5
label 2,2,8.2
label 3,3,9.6


### Create a DataFrame From a List of Dictionaries. 

In [10]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame 
store_items = pd.DataFrame(items2)
store_items

Unnamed: 0,bikes,pants,watches,glasses
0,20,30,35,
1,15,5,10,50.0


**Notes:**
* The DataFrame result columns are constructed from the ***dictionaries*** keys
* The DataFrame result rows are constructed from the number of dictionaries in the list, ***each dictionary is a row***.

Again, notice that since the items2 dictionary we created doesn't have label indices, Pandas automatically uses numerical row indexes when it creates the DataFrame.

As before, we can put labels to the row index by using the index keyword in the pd.DataFrame() function.

In [11]:
# We create a list of Python dictionaries
items2 = [{'bikes': 20, 'pants': 30, 'watches': 35}, 
          {'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5}]

# We create a DataFrame  and provide the row index
store_items = pd.DataFrame(items2, index = ['store 1', 'store 2'])

# We display the DataFrame
store_items

Unnamed: 0,bikes,pants,watches,glasses
store 1,20,30,35,
store 2,15,5,10,50.0


# DataFrame Characteristics

In [13]:
# We print some information about shopping_carts
print('shopping_carts has shape:', shopping_carts.shape) 
print('shopping_carts has dimension:', shopping_carts.ndim) #number of dimentions (rank x)
print('shopping_carts has a total of:', shopping_carts.size, 'elements') # total elements
print('\nThe data in shopping_carts is:\n', shopping_carts.values) # all dataframe valuse
print('\nThe row index in shopping_carts is:', shopping_carts.index) # rows labels
print('\nThe column index in shopping_carts is:', shopping_carts.columns) # columns labels

shopping_carts has shape: (5, 2)
shopping_carts has dimension: 2
shopping_carts has a total of: 10 elements

The data in shopping_carts is:
 [[245. 500.]
 [ nan  40.]
 [ nan 110.]
 [ 25.  45.]
 [ 55.  nan]]

The row index in shopping_carts is: Index(['bike', 'book', 'glasses', 'pants', 'watch'], dtype='object')

The column index in shopping_carts is: Index(['Bob', 'Alice'], dtype='object')


# Creating a Partial DataFrame
* **Picking columns:** pd.DataFrame(dictonary, columns=[selected columns])
* **Picking rows:** pd.DataFrame(dictonary, index=[selected rows])

In [18]:
whole_df = {'Bob' : pd.Series(data = [245, 25, 55], index = ['bike', 'pants', 'watch']),
         'Alice' : pd.Series(data = [40, 110, 500, 45], index = ['book', 'glasses', 'bike', 'pants'])}

#### Columns

In [19]:
# We Create a DataFrame that only has Bob's data
bob_shopping_cart = pd.DataFrame(items, columns=['Bob'])
bob_shopping_cart

Unnamed: 0,Bob
bike,245
pants,25
watch,55


#### Rows

In [20]:
# We Create a DataFrame that only has selected items for both Alice and Bob
sel_shopping_cart = pd.DataFrame(items, index = ['pants', 'book'])
sel_shopping_cart

Unnamed: 0,Bob,Alice
pants,25.0,45
book,,40


#### Rows & Columns

In [21]:
# We Create a DataFrame that only has selected items for Alice
alice_sel_shopping_cart = pd.DataFrame(items, index = ['glasses', 'bike'], columns = ['Alice'])
alice_sel_shopping_cart

Unnamed: 0,Alice
glasses,110
bike,500
