# DATAFRAME
---

A DataFrame is a 2-dimensional data structure of rows and columns, similar to a spreadsheet.

## Creating DataFrame from Lists

You can create a DataFrame by adding columns in the data structure. Each column is created from a collection of data represented as a list.

In [67]:
import pandas as pd

In [51]:
#creating an empty DataFrame
company = pd.DataFrame()

# creating data collections as lists
company_months = ['January','February','March']
company_income = [23500,19700,31150]

# adding columns to DataFrame
company['Month'] = company_months
company['Income'] = company_income

# displaying DataFrame contents
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150


### Tasks

Complete the data by adding the remaining months of the first half of the year and the income earned in these months. Then, display the data again.

In [52]:
df = pd.DataFrame({'Month': ['April', 'May', 'June'],
                   'Income': [19800, 25600, 31200]})

company = pd.merge(company, df, how='outer')
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150
3,April,19800
4,May,25600
5,June,31200


Display descriptive statistics for income earned.

In [24]:
company.describe()

Unnamed: 0,Income
count,6.0
mean,25158.333333
std,5174.013594
min,19700.0
25%,20725.0
50%,24550.0
75%,29762.5
max,31200.0


Display company income for the months of the second quarter.

In [41]:
company['Income'][3:6]

3    19800
4    25600
5    31200
Name: Income, dtype: int64

Display descriptive statistics for the income earned in the months of the second quarter.

In [43]:
company['Income'][3:6].describe()

count        3.000000
mean     25533.333333
std       5700.292390
min      19800.000000
25%      22700.000000
50%      25600.000000
75%      28400.000000
max      31200.000000
Name: Income, dtype: float64

## Creating DataFrame from 2D List

Instead of adding each column separately, you can create a DateFrame based on a two-dimensional (2D) list. Note that you will then need to add names to the columns you create.

In [53]:
# creating data collection as 2D list
company_data = [
    ['January',23500],
    ['February',19700],
    ['March',31150]
    ]

# creating DataFrame with column names
company = pd.DataFrame(data=company_data, columns=['Month','Income'])

# displaying DataFrame contents
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150


### Tasks

The table below lists the university's students. 

StudentID | Name        | Surname      | Age | Program
----------|-------------|--------------|-----|-----------
902311    | Peter       | Red          | 21  | Accounting   
915027    | Sofia       | White        | 19  | Computer Science
900004    | Jack        | Grey         | 24  | Accounting
994031    | Mark        | Brown        | 22  | Engineering         

Create a DataFrame using a 2D list. Then, display the contents of the DataFrame.

In [54]:
university_data = [
                [902311, 'Peter', 'Red', 21, 'Accounting'],
                [915027, 'Sofia', 'White', 19, 'Computer Science'],
                [900004, 'Jack', 'Grey', 24, 'Accounting'],
                [994031, 'Mark', 'Brown', 22, 'Engineering']
                ]

university = pd.DataFrame(data = university_data, columns = ['StudentID', 'Name', 'Surname', 'Age', 'Program'])
university

Unnamed: 0,StudentID,Name,Surname,Age,Program
0,902311,Peter,Red,21,Accounting
1,915027,Sofia,White,19,Computer Science
2,900004,Jack,Grey,24,Accounting
3,994031,Mark,Brown,22,Engineering


Calculate and display the average age of students.

In [56]:
university['Age'].mean()

21.5

## Creating DataFrame from Dictionary

As you know, a dictionary contains data consisting of key and value pairs of information, separated by a colon. Each pair of information represents one column in the DataFrame. The key is the name of the column and the value is the data collection (list). Below is an example of creating a DataFrame based on a dictionary.

In [57]:
# creating data collection as a dictionary
company_data = {
    'Month':['January','February','March'],
    'Income':[23500,19700,31150]
    }

#creating DataFrame
company = pd.DataFrame(data=company_data)

#displaying DataFrame contents
company

Unnamed: 0,Month,Income
0,January,23500
1,February,19700
2,March,31150


### Tasks

Complete the DataFrame by adding a 'Tax' column along with the following values: 1200, 2350, 995. Then, display DataFrame contents.

In [60]:
company_data['Tax'] = [1200, 2350, 995] 
company = pd.DataFrame(data=company_data)

#displaying DataFrame contents
company

Unnamed: 0,Month,Income,Tax
0,January,23500,1200
1,February,19700,2350
2,March,31150,995


Display descriptive statistics for the company income and tax.

In [61]:
company['Income'].describe()

count        3.000000
mean     24783.333333
std       5831.880772
min      19700.000000
25%      21600.000000
50%      23500.000000
75%      27325.000000
max      31150.000000
Name: Income, dtype: float64

In [62]:
company['Tax'].describe()

count       3.000000
mean     1515.000000
std       730.359501
min       995.000000
25%      1097.500000
50%      1200.000000
75%      1775.000000
max      2350.000000
Name: Tax, dtype: float64

## Creating DataFrame from File

Creating a DataFrame based on the data contained in a CSV file is incredibly simple. All you need to do is use the read_csv() function.

In [64]:
sales = pd.read_csv('product_sales.csv')
sales

Unnamed: 0,SaleRep,Region,Orders,TotalSales
0,Felice Lunck,West,218,44489
1,Doralynn Pesak,West,233,61035
2,Madelle Martland,East,264,62603
3,Yasmin Myhan,South,110,59377
4,Marmaduke Webbe,East,188,78771
5,Christiano Vero,East,265,68506
6,Cecelia Jealous,West,93,53634
7,Isaak Housiaux,East,189,62455
8,Derril Howland,East,385,73460
9,Judon Allom,West,230,51067


### Tasks

For sales data, calculate and display the average number of orders.

In [65]:
sales['Orders'].mean()

217.5

For sales data, calculate and display the total sales value.

In [66]:
sales['TotalSales'].sum()

615397

Continents file

In [69]:
df2 = pd.read_csv('continents.csv')
df2['Continent']


0           Europe
1             Asia
2    North America
3          Oceania
4           Africa
Name: Continent, dtype: object

In [70]:
df2.describe()

Unnamed: 0,Area,Population
count,5.0,5.0
mean,23543990.0,1494740000.0
std,15003120.0,1852185000.0
min,8510926.0,44491720.0
25%,10000000.0,595783500.0
50%,24230000.0,745173800.0
75%,30365000.0,1393676000.0
max,44614000.0,4694576000.0
