**Pandas Tutorial: DataFrames in Python**

Exercise Link: https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python

Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.

Pandas is a popular Python package for data science, and with good reason: it offers powerful, expressive and flexible data structures that make data manipulation and analysis easy, among many other things. The DataFrame is one of these structures.

This tutorial covers Pandas DataFrames, from basic manipulations to advanced operations, by tackling 11 of the most popular questions so that you understand -and avoid- the doubts of the Pythonistas who have gone before you.

**Pandas DataFrame**

In [5]:
# import numpy and pandas
import numpy as np
import pandas as pd

**How to create a pandas dataframe?**

In [6]:
data = np.array([['','Col1','Col2'],
                ['Row1',1,2],
                ['Row2',3,4]])
                
print(pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))

     Col1 Col2
Row1    1    2
Row2    3    4


In [7]:
# Take a 2D array as input to your DataFrame 
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(my_2darray)

# Take a dictionary as input to your DataFrame 
my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
print(my_dict)

# Take a DataFrame as input to your DataFrame 
my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])
print(my_df)

# Take a Series as input to your DataFrame
my_series = pd.Series({"Belgium":"Brussels", "India":"New Delhi", "United Kingdom":"London", "United States":"Washington"})
print(my_series)

[[1 2 3]
 [4 5 6]]
{1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
   A
0  4
1  5
2  6
3  7
Belgium             Brussels
India              New Delhi
United Kingdom        London
United States     Washington
dtype: object


In [8]:
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))

# Use the `shape` property
print(df.shape)

# Or use the `len()` function with the `index` property
print(len(df))

(2, 3)
2


**Fundamental DataFrame Operations**

In [17]:
df_2= pd.DataFrame([['1', '2', '3'],
                   ['4', '5', '6'],
                   ['7', '8', '9']],
                   columns=['A','B','C'])


In [18]:
print(df_2.shape)

(3, 3)


In [19]:
# Using `iloc[]`
print(df_2.iloc[0][0])

# Using `loc[]`
print(df_2.loc[0]['A'])

# Using `at[]`
print(df_2.at[0,'A'])

# Using `iat[]`
print(df_2.iat[0,0])

1
1
1
1


In [20]:
# Use `iloc[]` to select row `0`
print(df.iloc[0])

# Use `loc[]` to select column `'A'`
print(df.loc[:,'A'])

A    1
B    2
C    3
Name: 0, dtype: object
0    1
1    4
2    7
Name: A, dtype: object


In [22]:
# Print out your DataFrame `df` to check it out
print(df_2)

# Set 'C' as the index of your DataFrame
df_2.set_index('C')

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


Unnamed: 0_level_0,A,B
C,Unnamed: 1_level_1,Unnamed: 2_level_1
3,1,2
6,4,5
9,7,8


In [23]:
df_3 = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), index= [2, 'A', 4], columns=[48, 49, 50])

In [24]:
print(df_3)

   48  49  50
2   1   2   3
A   4   5   6
4   7   8   9


In [25]:
# Pass `2` to `loc`
print(df.loc[2])

# Pass `2` to `iloc`
print(df.iloc[2])

# Pass `2` to `ix`
print(df.ix[2])

A    7
B    8
C    9
Name: 2, dtype: object
A    7
B    8
C    9
Name: 2, dtype: object
A    7
B    8
C    9
Name: 2, dtype: object


.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  


In [27]:
df_4 = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), index= [2.5, 12.6, 4.8], columns=[48, 49, 50])

# There's no index labeled `2`, so you will change the index at position `2`
df_4.ix[2] = [60, 50, 40]
print(df)

# This will make an index labeled `2` and add the new values
df_4.loc[2] = [11, 12, 13]
print(df)

    A   B   C
0   1   2   3
1   4   5   6
2  11  12  13
    A   B   C
0   1   2   3
1   4   5   6
2  11  12  13


.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  after removing the cwd from sys.path.


In [33]:
df_5 = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['A', 'B', 'C'])

# Use `.index`
df_5['D'] = df.index

# Print `df`
print(df_5)

   A  B  C  D
0  1  2  3  0
1  4  5  6  1
2  7  8  9  2


In [35]:
# Check out the weird index of your dataframe
print(df_5)

# Use `reset_index()` to reset the values. 
df_reset = df_5.reset_index(level=0, drop=True)

# Print `df_reset`
print(df_reset)

   A  B  C  D
0  1  2  3  0
1  4  5  6  1
2  7  8  9  2
   A  B  C  D
0  1  2  3  0
1  4  5  6  1
2  7  8  9  2


In [40]:
# how to delete items
df_6 = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [40, 50, 60], [23, 35, 37]]), 
                  index= [2.5, 12.6, 4.8, 4.8, 2.5], 
                  columns=[48, 49, 50])
                  
df_6.reset_index().drop_duplicates(subset='index', keep='last').set_index('index')

Unnamed: 0_level_0,48,49,50
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12.6,4,5,6
4.8,40,50,60
2.5,23,35,37


In [46]:
# iteration over dataframe

df = pd.DataFrame(data=np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['A', 'B', 'C'])

for index, row in df.iterrows() :
    print(row['A'], row['B'])

1 2
4 5
7 8


In [47]:
# use melt to reshape dataframe
# The `people` DataFrame
people = pd.DataFrame({'FirstName' : ['John', 'Jane'],
                       'LastName' : ['Doe', 'Austen'],
                       'BloodType' : ['A-', 'B+'],
                       'Weight' : [90, 64]})

# Use `melt()` on the `people` DataFrame
print(pd.melt(people, id_vars=['FirstName', 'LastName'], var_name='measurements'))

  FirstName LastName measurements value
0      John      Doe    BloodType    A-
1      Jane   Austen    BloodType    B+
2      John      Doe       Weight    90
3      Jane   Austen       Weight    64


**Pivoting Dataframe**

In [48]:
# Import the Pandas library
import pandas as pd

# Your DataFrame
products = pd.DataFrame({'category': ['Cleaning', 'Cleaning', 'Entertainment', 'Entertainment', 'Tech', 'Tech'],
                        'store': ['Walmart', 'Dia', 'Walmart', 'Fnac', 'Dia','Walmart'],
                        'price':[11.42, 23.50, 19.99, 15.95, 19.99, 111.55],
                        'testscore': [4, 3, 5, 7, 5, 8]})

# Pivot your `products` DataFrame with `pivot_table()`
pivot_products = products.pivot_table(index='category', columns='store', values='price', aggfunc='mean')

# Check out the results
print(pivot_products)

store            Dia   Fnac  Walmart
category                            
Cleaning       23.50    NaN    11.42
Entertainment    NaN  15.95    19.99
Tech           19.99    NaN   111.55


In [49]:
# Import the Pandas library
import pandas as pd

# Construct the DataFrame
products = pd.DataFrame({'category': ['Cleaning', 'Cleaning', 'Entertainment', 'Entertainment', 'Tech', 'Tech'],
                        'store': ['Walmart', 'Dia', 'Walmart', 'Fnac', 'Dia','Walmart'],
                        'price':[11.42, 23.50, 19.99, 15.95, 55.75, 111.55],
                        'testscore': [4, 3, 5, 7, 5, 8]})

# Use `pivot()` to pivot your DataFrame
pivot_products = products.pivot(index='category', columns='store')

# Check out the results
print(pivot_products)

               price                testscore             
store            Dia   Fnac Walmart       Dia Fnac Walmart
category                                                  
Cleaning       23.50    NaN   11.42       3.0  NaN     4.0
Entertainment    NaN  15.95   19.99       NaN  7.0     5.0
Tech           55.75    NaN  111.55       5.0  NaN     8.0


In [50]:
# Creating an empty dataframe
df = pd.DataFrame(np.nan, index=[0,1,2,3], columns=['A'])
print(df)

    A
0 NaN
1 NaN
2 NaN
3 NaN


In [53]:
# creating an empty dataframe with a specific data type using dtype
df = pd.DataFrame(index=range(0,4),columns=['A'], dtype='float')
print(df)

    A
0 NaN
1 NaN
2 NaN
3 NaN
