## Pandas Practice
(information from https://python-programming.quantecon.org/pandas.html)


In [109]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests

In [110]:
#creating a series of four random observations
s = pd.Series(np.random.randn(4), name='daily returns')
s

0    1.645012
1   -1.318412
2   -1.671678
3   -0.845789
Name: daily returns, dtype: float64

In [111]:
s* 100

0    164.501170
1   -131.841155
2   -167.167778
3    -84.578878
Name: daily returns, dtype: float64

In [112]:
np.abs(s)

0    1.645012
1    1.318412
2    1.671678
3    0.845789
Name: daily returns, dtype: float64

In [113]:
#generates information of the dataset/series for you to gain a holistic understanding
s.describe()

count    4.000000
mean    -0.547717
std      1.500463
min     -1.671678
25%     -1.406728
50%     -1.082100
75%     -0.223089
max      1.645012
Name: daily returns, dtype: float64

In [114]:
#changing from numerical, 0-based indexing to given names
s.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']
s

AMZN    1.645012
AAPL   -1.318412
MSFT   -1.671678
GOOG   -0.845789
Name: daily returns, dtype: float64

In [115]:
#zeroing out amzn (chaging values directly)
s['AMZN'] = 0
s

AMZN    0.000000
AAPL   -1.318412
MSFT   -1.671678
GOOG   -0.845789
Name: daily returns, dtype: float64

In [116]:
#checking if 'x' is in dataset
'AAPL' in s

True

## 14.3. DataFrames
While a Series is a single column of data, a DataFrame is several columns, one for each variable.

In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet.

Thus, it is a powerful tool for representing and analyzing data that are naturally organized into rows and columns, often with descriptive indexes for individual rows and individual columns.

Let’s look at an example that reads data from the CSV file pandas/data/test_pwt.csv, which is taken from the Penn World Tables.

The dataset contains the following indicators

Variable Name

Description

POP

Population (in thousands)

XRAT

Exchange Rate to US Dollar

tcgdp

Total PPP Converted GDP (in million international dollar)

cc

Consumption Share of PPP Converted GDP Per Capita (%)

cg

Government Consumption Share of PPP Converted GDP Per Capita (%)

In [117]:
#We’ll read this in from a URL using the pandas function read_csv.
df = pd.read_csv('https://raw.githubusercontent.com/QuantEcon/lecture-python-programming/master/source/_static/lecture_specific/pandas/data/test_pwt.csv')
type(df)

pandas.core.frame.DataFrame

In [118]:
#content peak
df

Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
0,Argentina,ARG,2000,37335.653,0.9995,295072.2,75.716805,5.578804
1,Australia,AUS,2000,19053.186,1.72483,541804.7,67.759026,6.720098
2,India,IND,2000,1006300.297,44.9416,1728144.0,64.575551,14.072206
3,Israel,ISR,2000,6114.57,4.07733,129253.9,64.436451,10.266688
4,Malawi,MWI,2000,11801.505,59.543808,5026.222,74.707624,11.658954
5,South Africa,ZAF,2000,45064.098,6.93983,227242.4,72.71871,5.726546
6,United States,USA,2000,282171.957,1.0,9898700.0,72.347054,6.032454
7,Uruguay,URY,2000,3219.793,12.099592,25255.96,78.97874,5.108068


In [119]:
#selecting data by position
# usefull for working with subsets of data, maybe there is an interval of data peaking our intrests
df[2:5]

Unnamed: 0,country,country isocode,year,POP,XRAT,tcgdp,cc,cg
2,India,IND,2000,1006300.297,44.9416,1728144.0,64.575551,14.072206
3,Israel,ISR,2000,6114.57,4.07733,129253.9,64.436451,10.266688
4,Malawi,MWI,2000,11801.505,59.543808,5026.222,74.707624,11.658954


In [120]:
# or we can check out spesific columns rather than spesific rows
df[['country', 'tcgdp']]

Unnamed: 0,country,tcgdp
0,Argentina,295072.2
1,Australia,541804.7
2,India,1728144.0
3,Israel,129253.9
4,Malawi,5026.222
5,South Africa,227242.4
6,United States,9898700.0
7,Uruguay,25255.96


In [121]:
#To select both rows and columns using integers, the iloc attribute should be used with the format .iloc[rows, columns].
df.iloc[2:5, 0:4]
#OR LIKE THIS
df.loc[df.index[2:5], ['country', 'tcgdp']]

Unnamed: 0,country,tcgdp
2,India,1728144.0
3,Israel,129253.9
4,Malawi,5026.222
