# Understanding Dataframe Structure Lab

The purpose of the lab is to familarize you with a dataframe by creating it from lists, and then to a series, and finally to a dataframe.  You'll usually create dataframes from pulling in data from files into either an empty dataframe or a dynamically created one into which the data is downloaded.  But this procedure should help you understand the structure of a dataframe.

Description 

A DataFrame is a Pandas object that consists of columns and rows

Dataframe is a 2D data structure

Dataframe is used to represent data in tabular format in rows and columns

It is like a spreadsheet or a sql table

Data frames are built into the R programming language and its standard library 

Python makes dataframes available through the pandas library

This object provides attributes and methods for working with the data

Each column in a DataFrame is a Series object that consists of an array of labeled values, plus attributes and methods for 
working with the object
  
The intersection of a column and row in a DataFrame is called an element, and each element can contain a value, which is 
called a datapoint

### Create a simple Pandas Series from a list:

In [1]:
# make pandas available
import pandas as pd

In [2]:
# create a list
a = [1,7,2]

In [3]:
# confirm it's a list using a print (displays with square arount the values indicating a list)
print(a)

[1, 7, 2]


In [4]:
type(a)        # type returns the decription list

list

### Series is a one-dimensional array with axis labels

In [5]:
# create a series using pandas .Series metho
myvar = pd.Series(a) # creates a series with a single list from the list a
#confirm the series contents and that it's a series
print(myvar)

0    1
1    7
2    2
dtype: int64


In [6]:
type(myvar)

pandas.core.series.Series

In [7]:
print(myvar[2])

2


### Convert a Series to a Dataframe

To convert Pandas Series to a DataFrame, use the to_frame() method of Series. 

In [8]:
# converting the series into the dataframe
dataframe = myvar.to_frame()

In [9]:
print(dataframe)

   0
0  1
1  7
2  2


In [None]:
type(dataframe)

In [None]:
# display datframe characteristics
print(dataframe.info(verbose=True))


The primary difference between Series and Data Frame is that Series can only contain a single list with a particular index. 

In contrast, the DataFrame is a combination of more than one Series.

### To convert Multiple Series to DataFrame

Define the multiple Series one by one.

Convert the Series to DataFrame using to_frame() function.

Merge the different DataFrames to single using Pandas concat() function. 

In [10]:
import pandas as pd

first_name = ['Millie','Finn','Sadie','Gaten','Noah']
series_first_name = pd.Series(first_name)
print (series_first_name)

last_name = ['Brown','Wolfhard','Sink','Matarazzo','Schnapp']
series_last_name = pd.Series(last_name)
print(series_last_name)

age = [15, 17, 16, 17, 15]
series_age = pd.Series(age)
print(series_age)

0    Millie
1      Finn
2     Sadie
3     Gaten
4      Noah
dtype: object
0        Brown
1     Wolfhard
2         Sink
3    Matarazzo
4      Schnapp
dtype: object
0    15
1    17
2    16
3    17
4    15
dtype: int64


In [11]:
first_name = ['Millie', 'Finn', 'Sadie', 'Gaten', 'Noah']
series_first_name = pd.Series(first_name)
df_first_name = pd.DataFrame(series_first_name, columns=['First Name'])
print(df_first_name)

last_name = ['Brown', 'Wolfhard', 'Sink', 'Matarazzo', 'Schnapp']
series_last_name = pd.Series(last_name)
df_last_name = pd.DataFrame(series_last_name, columns=['Last Name'])
print(df_last_name)

age = [15, 17, 16, 17, 15]
series_age = pd.Series(age)
df_age = pd.DataFrame(series_age, columns=['Age'])
print(df_age)

  First Name
0     Millie
1       Finn
2      Sadie
3      Gaten
4       Noah
   Last Name
0      Brown
1   Wolfhard
2       Sink
3  Matarazzo
4    Schnapp
   Age
0   15
1   17
2   16
3   17
4   15


To concat Pandas DataFrames, use the Pandas concat() method. 

The concat() method is used to concatenate the pandas objects along a particular axis with optional set logic, which can be union or intersection along the other axes.

In [12]:
df = pd.concat([df_first_name, df_last_name, df_age], axis=1)

In [13]:
type(df)

pandas.core.frame.DataFrame

In [14]:
print(df)

  First Name  Last Name  Age
0     Millie      Brown   15
1       Finn   Wolfhard   17
2      Sadie       Sink   16
3      Gaten  Matarazzo   17
4       Noah    Schnapp   15


In [None]:
The following links provide additional information and exercises for dataframes.

https://www.w3schools.com/datascience/ds_python_dataframe.asp
https://www.w3schools.com/python/pandas/pandas_series.asp
https://appdividend.com/2020/05/26/pandas-series-to_frame-convert-series-to-dataframe/#:~:text=How%20to%20convert%20Multiple%20Series%20to%20DataFrame%201,%28%29%20method.%20...%203%20Concat%20all%20three%20DataFrames
https://www.datasciencemadesimple.com/create-series-in-python-pandas/
https://pandas.pydata.org/docs/pandas.pdf