## Creating and Combining DataFrame
<b>class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)</b>

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. 

<b>class pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)</b>
One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).

### Here are the main steps we will go through
* How to create dataframe using pandas?
* How to combine two data set using pandas?

This is Just a little illustration.

<img style="float: left;" src="https://www.tutorialspoint.com/python_pandas/images/structure_table.jpg"></img>

In [5]:
import pandas as pd
import numpy as np

#### How to create dataframe using pandas?

In [16]:
# working with series
#create a series
s = pd.Series(np.random.randn(5))
#create a dataframe column
df = pd.DataFrame(s, columns=['Column_1'])
df 

Unnamed: 0,Column_1
0,-0.649263
1,-0.115861
2,-1.249926
3,0.661612
4,-0.925243


In [8]:
#sorting 
df.sort_values(by='Column_1')

Unnamed: 0,Column_1
0,-1.446901
2,-0.092001
4,0.337288
3,0.841177
1,1.308139


In [10]:
#boolean indexing
#It returns all rows in column_name,
#that are less than 10
df[df['Column_1'] <= 1]

Unnamed: 0,Column_1
0,-1.446901
2,-0.092001
3,0.841177
4,0.337288


In [230]:
# creating simple series
obj2 = pd.Series(np.random.randn(5), index=['d', 'b', 'a', 'c', 'e'])
obj2

d   -0.691226
b    0.611193
a   -0.022808
c    0.717137
e   -1.631156
dtype: float64

In [20]:
obj2.index

Index(['d', 'b', 'a', 'c', 'e'], dtype='object')

In [229]:
# returns the value in e
obj2['e']

0.35472068069039486

In [26]:
# returns all values that are greater than -2
obj2[obj2 > -2]

b   -0.140732
a   -0.310954
c   -1.305166
e    0.354721
dtype: float64

In [27]:
# we can do multiplication on dataframe
obj2 * 2

d   -4.177476
b   -0.281464
a   -0.621908
c   -2.610332
e    0.709441
dtype: float64

In [28]:
# we can do boolean expression
'b' in obj2

True

In [228]:
# returns false, because 'g' is not defined in our data
'g' in obj2

False

In [39]:
#Let's see we have this data
sdata = {'Cat': 24, 'Dog': 11, 'Fox': 18, 'Horse': 1000}
obj3 = pd.Series(sdata)
obj3

Cat        24
Dog        11
Fox        18
Horse    1000
dtype: int64

In [227]:
# defined list, and assign series to it
sindex = ['Lion', 'Dog', 'Cat', 'Horse']
obj4 = pd.Series(sdata, index=sindex)
obj4

Lion        NaN
Dog        11.0
Cat        24.0
Horse    1000.0
dtype: float64

In [226]:
# checking if our data contains null
obj4.isnull()

Lion      True
Dog      False
Cat      False
Horse    False
dtype: bool

In [44]:
#we can add two dataframe together
obj3 + obj4

Cat        48.0
Dog        22.0
Fox         NaN
Horse    2000.0
Lion        NaN
dtype: float64

In [224]:
# we can create series calling Series function on pandas
programming = pd.Series([89,78,90,100,98])
programming

0     89
1     78
2     90
3    100
4     98
dtype: int64

In [223]:
# assign index to names
programming.index = ['C++', 'C', 'R', 'Python', 'Java']
programming

C++        89
C          78
R          90
Python    100
Java       98
dtype: int64

In [102]:
# let's create simple data
data = {'Programming': ['C++', 'C', 'R', 'Python', 'Java'],
        'Year': [1998, 1972, 1993, 1980, 1991],
        'Popular': [90, 79, 75, 99, 97]}
frame = pd.DataFrame(data)
frame

Unnamed: 0,Popular,Programming,Year
0,90,C++,1998
1,79,C,1972
2,75,R,1993
3,99,Python,1980
4,97,Java,1991


In [103]:
# set our index 
pd.DataFrame(data, columns=['Popular', 'Programming', 'Year'])

Unnamed: 0,Popular,Programming,Year
0,90,C++,1998
1,79,C,1972
2,75,R,1993
3,99,Python,1980
4,97,Java,1991


In [133]:
data2 = pd.DataFrame(data, columns=['Year', 'Programming', 'Popular', 'Users'],
                    index=[1,2,3,4,5])
data2

Unnamed: 0,Year,Programming,Popular,Users
1,1998,C++,90,
2,1972,C,79,
3,1993,R,75,
4,1980,Python,99,
5,1991,Java,97,


In [134]:
data2['Programming']

1       C++
2         C
3         R
4    Python
5      Java
Name: Programming, dtype: object

In [135]:
data2.Popular

1    90
2    79
3    75
4    99
5    97
Name: Popular, dtype: int64

In [137]:
data2.Users = np.random.random(5)*104
data2 = np.round(data2)
data2

Unnamed: 0,Year,Programming,Popular,Users
1,1998,C++,90,83.0
2,1972,C,79,43.0
3,1993,R,75,25.0
4,1980,Python,99,14.0
5,1991,Java,97,38.0


#### How to combine two data set using pandas?

In [169]:
# we will do merging two dataset together 
merg1 = {'Edit': 24, 'View': 11, 'Inser': 18, 'Cell': 40}
merg1 = pd.Series(merg1)
merg1 = pd.DataFrame(merg1, columns=['Merge1'])

merg2 = {'Kernel':50, 'Navigate':27, 'Widgets':29,'Help':43}
merg2 = pd.Series(merg2)
merg2 = pd.DataFrame(merg2, columns=['Merge2'])

In [170]:
merg1

Unnamed: 0,Merge1
Cell,40
Edit,24
Inser,18
View,11


In [171]:
merg2

Unnamed: 0,Merge2
Help,43
Kernel,50
Navigate,27
Widgets,29


In [195]:
#join matching rows from bdf to adf
#pd.merge(merg1, merg2, left_index=True, right_index=True)
join = merg1.join(merg2)
join

Unnamed: 0,Merge1,Merge2
Cell,40,
Edit,24,
Inser,18,
View,11,


In [199]:
#replace all NA/null data with value
join = join.fillna(12)
join

Unnamed: 0,Merge1,Merge2
Cell,40,12.0
Edit,24,12.0
Inser,18,12.0
View,11,12.0


In [201]:
#compute and append one or more new columns
join = join.assign(Area=lambda df: join.Merge1*join.Merge2)
join

Unnamed: 0,Merge1,Merge2,Area
Cell,40,12.0,480.0
Edit,24,12.0,288.0
Inser,18,12.0,216.0
View,11,12.0,132.0


In [205]:
#add single column
join['Volume'] = join.Merge1*join.Merge2*join.Area
join

Unnamed: 0,Merge1,Merge2,Area,Volume
Cell,40,12.0,480.0,230400.0
Edit,24,12.0,288.0,82944.0
Inser,18,12.0,216.0,46656.0
View,11,12.0,132.0,17424.0


In [209]:
join.head(2)

Unnamed: 0,Merge1,Merge2,Area,Volume
Cell,40,12.0,480.0,230400.0
Edit,24,12.0,288.0,82944.0


In [208]:
join.tail(2)

Unnamed: 0,Merge1,Merge2,Area,Volume
Inser,18,12.0,216.0,46656.0
View,11,12.0,132.0,17424.0


In [210]:
#randomly select fraction of rows
join.sample(frac=0.5)

Unnamed: 0,Merge1,Merge2,Area,Volume
Edit,24,12.0,288.0,82944.0
Inser,18,12.0,216.0,46656.0


In [211]:
#order rows by values of a column (low to high)
join.sort_values('Volume')

Unnamed: 0,Merge1,Merge2,Area,Volume
View,11,12.0,132.0,17424.0
Inser,18,12.0,216.0,46656.0
Edit,24,12.0,288.0,82944.0
Cell,40,12.0,480.0,230400.0


In [213]:
#order row by values of a column (high to low)
join.sort_values('Volume', ascending=False)

Unnamed: 0,Merge1,Merge2,Area,Volume
Cell,40,12.0,480.0,230400.0
Edit,24,12.0,288.0,82944.0
Inser,18,12.0,216.0,46656.0
View,11,12.0,132.0,17424.0


In [217]:
#return the columns of a dataframe - by renaming
join = join.rename(columns={'Merge1':'X','Merge2':'Y'})

In [218]:
join

Unnamed: 0,X,Y,Area,Volume
Cell,40,12.0,480.0,230400.0
Edit,24,12.0,288.0,82944.0
Inser,18,12.0,216.0,46656.0
View,11,12.0,132.0,17424.0


In [220]:
#count number of rows with each unique value of variable
join['Y'].value_counts()

12.0    4
Name: Y, dtype: int64

In [221]:
#number of rows in dataframe
len(join)

4

In [222]:
#descriptive statistics
join.describe()

Unnamed: 0,X,Y,Area,Volume
count,4.0,4.0,4.0,4.0
mean,23.25,12.0,279.0,94356.0
std,12.365948,0.0,148.391374,94572.769696
min,11.0,12.0,132.0,17424.0
25%,16.25,12.0,195.0,39348.0
50%,21.0,12.0,252.0,64800.0
75%,28.0,12.0,336.0,119808.0
max,40.0,12.0,480.0,230400.0


### Thank you, more to come soon!