# DATAFRAME
---

A DataFrame is a 2-dimensional data structure of rows and columns, similar to a spreadsheet.

## Creating DataFrame from Lists

You can create a DataFrame by adding columns in the data structure. Each column is created from a collection of data represented as a list.

In [13]:
import pandas as pd

In [30]:
#creating an empty DataFrame
company = pd.DataFrame()

# creating data collections as lists
company_months = ['January','February','March']
company_income = [23500,19700,31150]

company['Months'] = company_months
company['Income'] = company_income



# displaying DataFrame contents
print(company)

     Months  Income
0   January   23500
1  February   19700
2     March   31150


### Tasks

Complete the data by adding the remaining months of the first half of the year and the income earned in these months. Then, display the data again.

In [29]:
#creating an empty DataFrame
company = pd.DataFrame()

# creating data collections as lists
company_months = ['January','February','March', 'April','May', 'June']
company_income = [23500,19700,31150,123123,123123,123123 ]

company['Months'] = company_months
company['Income'] = company_income



# displaying DataFrame contents
print(company)

     Months  Income
0   January   23500
1  February   19700
2     March   31150
3     April  123123
4       May  123123
5      June  123123


Display descriptive statistics for income earned.

In [28]:
company.describe()

Unnamed: 0,Income
count,6.0
mean,73953.166667
std,53988.992817
min,19700.0
25%,25412.5
50%,77136.5
75%,123123.0
max,123123.0


Display company income for the months of the second quarter.

In [35]:
#creating an empty DataFrame
company = pd.DataFrame()

# creating data collections as lists
company_months = ['January','February','March', 'April','May', 'June', 'July','August','September','Ocotber','Noivember','December']
company_income = [23500,19700,31150,123123,123123,123123, 5444,6734,43434,43434,2323,2394]

company['Months'] = company_months
company['Income'] = company_income
company[3:6]

Unnamed: 0,Months,Income
3,April,123123
4,May,123123
5,June,123123


Display descriptive statistics for the income earned in the months of the second quarter.

In [34]:
company[6:12].describe()

Unnamed: 0,Income
count,6.0
mean,17293.833333
std,20320.807764
min,2323.0
25%,3156.5
50%,6089.0
75%,34259.0
max,43434.0


## Creating DataFrame from 2D List

Instead of adding each column separately, you can create a DateFrame based on a two-dimensional (2D) list. Note that you will then need to add names to the columns you create.

In [None]:
# creating data collection as 2D list
company_data = [
    ['January',23500],
    ['February',19700],
    ['March',31150]
    ]

# creating DataFrame with column names
company = pd.DataFrame(data=company_data, columns=['Month','Income'])

# displaying DataFrame contents
company

### Tasks

The table below lists the university's students. 

StudentID | Name        | Surname      | Age | Program
----------|-------------|--------------|-----|-----------
902311    | Peter       | Red          | 21  | Accounting   
915027    | Sofia       | White        | 19  | Computer Science
900004    | Jack        | Grey         | 24  | Accounting
994031    | Mark        | Brown        | 22  | Engineering         

Create a DataFrame using a 2D list. Then, display the contents of the DataFrame.

In [45]:
# creating data collection as 2D list
students_data = [
    ['902311','Peter', 'Red', 21, "Accounting"],
    ['915027','Sofia', 'White', 19, "Computer Science"],
    ['900004','Jack', 'Grey', 24, "Accounting"],
    ['994031','Mark', 'Brown', 22, "Engineering"],
]
 
# creating DataFrame with column names
students = pd.DataFrame(data=students_data, columns=['StudentID','Name','Surname','Age','Program'])

# displaying DataFrame contents
students

Unnamed: 0,StudentID,Name,Surname,Age,Program
0,902311,Peter,Red,21,Accounting
1,915027,Sofia,White,19,Computer Science
2,900004,Jack,Grey,24,Accounting
3,994031,Mark,Brown,22,Engineering


Calculate and display the average age of students.

In [46]:
students['Age'].mean()

21.5

## Creating DataFrame from Dictionary

As you know, a dictionary contains data consisting of key and value pairs of information, separated by a colon. Each pair of information represents one column in the DataFrame. The key is the name of the column and the value is the data collection (list). Below is an example of creating a DataFrame based on a dictionary.

In [None]:
# creating data collection as a dictionary
company_data = {
    'Month':['January','February','March'],
    'Income':[23500,19700,31150]
    }

#creating DataFrame
company = pd.DataFrame(data=company_data)

#displaying DataFrame contents
company

### Tasks

Complete the DataFrame by adding a 'Tax' column along with the following values: 1200, 2350, 995. Then, display DataFrame contents.

In [39]:
# creating data collection as a dictionary
company_data = {
    'Month':['January','February','March'],
    'Income':[23500,19700,31150],
    'Tax': [1200,2350,995]
    }

#creating DataFrame
company = pd.DataFrame(data=company_data)

#displaying DataFrame contents
company

Unnamed: 0,Month,Income,Tax
0,January,23500,1200
1,February,19700,2350
2,March,31150,995


Display descriptive statistics for the company income and tax.

In [47]:
data =company_data['Income']
sum1=0
number_of_months1= 0
for i in data:
    sum1 += i
    number_of_months1+= 1
mean1 = sum1 /number_of_months1
print(mean1)

data =company_data['Tax']
sum2=0
number_of_months2= 0
for i in data:
    sum2 += i
    number_of_months2+= 1
mean2 = sum2 /number_of_months2
print(mean2)


24783.333333333332
1515.0


## Creating DataFrame from File

Creating a DataFrame based on the data contained in a CSV file is incredibly simple. All you need to do is use the read_csv() function.

In [48]:
sales = pd.read_csv('product_sales.csv')
sales

Unnamed: 0,SaleRep,Region,Orders,TotalSales
0,Felice Lunck,West,218,44489
1,Doralynn Pesak,West,233,61035
2,Madelle Martland,East,264,62603
3,Yasmin Myhan,South,110,59377
4,Marmaduke Webbe,East,188,78771
5,Christiano Vero,East,265,68506
6,Cecelia Jealous,West,93,53634
7,Isaak Housiaux,East,189,62455
8,Derril Howland,East,385,73460
9,Judon Allom,West,230,51067


### Tasks

For sales data, calculate and display the average number of orders.

In [49]:
sales['Orders'].mean()

217.5

For sales data, calculate and display the total sales value.

In [50]:
sales['TotalSales'].sum()

615397

In [52]:

company_data = {
    'Month':['January','February','March'],
    'Income':[23500,19700,31150],
    'Tax': [1200,2350,995]
    }

Unnamed: 0,Continent,Area,Population
0,Europe,10000000,745173774
1,Asia,44614000,4694576167
2,North America,24230000,595783465
3,Oceania,8510926,44491724
4,Africa,30365000,1393676444


The continents.csv file contains information about the area and population. Based on this data, create a DataFrame. 
Display a list of continents along with descriptive statistics.

In [14]:
import pandas as pd

#continents =pd.read_csv('continents.csv')
continents ={
    'Continent':["Europe","Asia","North America","Oceania", "Africa"],
    'Area':[10000000, 44614000,	24230000,851092,3036500],
    'Population': [745173774,4694576167,595783465,44491724,1393676444]
}
df = pd.DataFrame(data = continents)
df.describe()

Unnamed: 0,Area,Population
count,5.0,5.0
mean,16546320.0,1494740000.0
std,18155880.0,1852185000.0
min,851092.0,44491720.0
25%,3036500.0,595783500.0
50%,10000000.0,745173800.0
75%,24230000.0,1393676000.0
max,44614000.0,4694576000.0
