# Intro to Pandas

pandas is built on top of NumPy and Matplotlib. It stores data in rectangular data frames as we know, note R and SQL have similar data structures. Each value in a column must have the same datatype.

We can see the first few lines using the **.head()** fucntion

The names and datatypes of columns using **.info()**

**.shape** to see the number of rows and columns. Note as this is an attribute and not a method we write it without using parenthesis 

**.describe()** gives some headline statistics for the data frame

In [1]:
import pandas as pd
import numpy as np 

portfolio = pd.read_csv("/data/workspace_files/investment_portfolio.csv")
portfolio.head()

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP
0,ADVANCED MICRO DEVICES,US0079031078,2,95.12,USD 190.24,155.14
1,AIRBUS SE,NL0000235190,3,106.38,EUR 319.14,270.96
2,ALPHABET INC. - CLASS C,US02079K1079,1,2330.31,USD 2330.31,1900.38
3,AMAZON.COM INC. - COM,US0231351067,1,2261.1,USD 2261.10,1843.93
4,BAE SYS.,GB0002634946,20,741.2,GBX 14824.00,148.24


In [2]:
portfolio.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 91 entries, 0 to 90
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Product       91 non-null     object 
 1   Symbol/ISIN   83 non-null     object 
 2   Amount        91 non-null     int64  
 3   Closing       91 non-null     float64
 4   Local value   91 non-null     object 
 5   Value in GBP  91 non-null     float64
dtypes: float64(2), int64(1), object(3)
memory usage: 4.4+ KB


In [3]:
portfolio.shape

(91, 6)

In [4]:
portfolio.describe()

Unnamed: 0,Amount,Closing,Value in GBP
count,91.0,91.0,91.0
mean,240.142857,1598.079231,1976.114615
std,821.312819,3599.125363,6432.638929
min,-1.0,1.27,-22558.87
25%,2.5,98.8,157.08
50%,12.0,402.48,521.4
75%,80.5,2022.25,2264.25
max,7000.0,24698.0,32832.17


There are three components to a data frame **values, columns and index**. These are accessed by using the appropriate attributes. Pandas does not follow the Python ethos of there being one way to do things so often there are multiple and things can get quite complicated

In [5]:
portfolio.values

In [6]:
portfolio.columns

Index(['Product', 'Symbol/ISIN', 'Amount', 'Closing', 'Local value',
       'Value in GBP'],
      dtype='object')

In [7]:
portfolio.index

RangeIndex(start=0, stop=91, step=1)

# Sorting In Pandas

There are two key ways to find parts of your data frame

Rows can be sorted using the **.sort_values()** method. We can sort by multiple variables by passing multiple columns into this function.

If we enter in a single column we will get a data series containing that column back, if we want multiple columns we have to use two sets of brackets. This works as it is essentially a list inside the columns, this means we can use a different list to do the same thing. 

In [8]:
portfolio.sort_values("Value in GBP", ascending=False)

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP
83,SPDR S&P 500 $,IE00B6YX5C33,100,402.60,USD 40260.00,32832.17
73,POWERSHS EQQQ,IE0032077012,110,24698.00,GBX 2716780.00,27167.80
23,ES 3400 P 18DEC26,,1,357.00,USD 17850.00,14556.74
24,ES 3400 P 19DEC25,,1,334.50,USD 16725.00,13639.30
47,ISHR EU PROP,IE00B0M63284,400,3013.00,GBX 1205200.00,12052.00
...,...,...,...,...,...,...
11,CANOPY GROWTH CORPORATION COMMO...,CA1380351009,12,5.92,USD 71.04,57.93
25,ES 3600 P 17JUN22,,-1,27.25,USD -1362.50,-1111.12
26,ES 3800 P 16DEC22,,-1,218.75,USD -10937.50,-8919.57
28,ES 4000 P 19DEC25,,-1,526.50,USD -26325.00,-21468.13


In [9]:
portfolio.sort_values(["Amount", "Closing"] , ascending=False)

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP
8,BLUEFIELD SOLAR,GG00BB0RDB98,7000,133.00,GBX 931000.00,9310.00
34,HEND.FAR EAST,JE00B1GXH751,2000,290.00,GBX 580000.00,5800.00
75,REAL EST.CRED,GB00B0HW5366,1920,150.75,GBX 289440.00,2894.40
61,ISHR UK PROP,IE00B1TXLS18,1500,589.50,GBX 884250.00,8842.50
6,BLACKSTONE GSO£,JE00BNCB5T53,1485,64.00,GBX 95040.00,950.40
...,...,...,...,...,...,...
21,ES 3000 P 17JUN22,,1,3.90,USD 195.00,159.02
27,ES 4000 P 18DEC26,,-1,553.25,USD -27662.50,-22558.87
28,ES 4000 P 19DEC25,,-1,526.50,USD -26325.00,-21468.13
26,ES 3800 P 16DEC22,,-1,218.75,USD -10937.50,-8919.57


In [10]:
portfolio.sort_values(["Amount", "Closing"] , ascending=[False, True])

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP
8,BLUEFIELD SOLAR,GG00BB0RDB98,7000,133.00,GBX 931000.00,9310.00
34,HEND.FAR EAST,JE00B1GXH751,2000,290.00,GBX 580000.00,5800.00
75,REAL EST.CRED,GB00B0HW5366,1920,150.75,GBX 289440.00,2894.40
61,ISHR UK PROP,IE00B1TXLS18,1500,589.50,GBX 884250.00,8842.50
6,BLACKSTONE GSO£,JE00BNCB5T53,1485,64.00,GBX 95040.00,950.40
...,...,...,...,...,...,...
2,ALPHABET INC. - CLASS C,US02079K1079,1,2330.31,USD 2330.31,1900.38
25,ES 3600 P 17JUN22,,-1,27.25,USD -1362.50,-1111.12
26,ES 3800 P 16DEC22,,-1,218.75,USD -10937.50,-8919.57
28,ES 4000 P 19DEC25,,-1,526.50,USD -26325.00,-21468.13


In [11]:
portfolio["Product"]

In [12]:
portfolio[["Product", "Value in GBP"]]

Unnamed: 0,Product,Value in GBP
0,ADVANCED MICRO DEVICES,155.14
1,AIRBUS SE,270.96
2,ALPHABET INC. - CLASS C,1900.38
3,AMAZON.COM INC. - COM,1843.93
4,BAE SYS.,148.24
...,...,...
86,UNILEVER,112.41
87,VOLTA FIN,3005.60
88,WALT DISNEY COMPANY (T,262.58
89,XTRACKERS MSCI SINGAPORE UCITS ...,1294.35


In [13]:
columns_to_subset = ["Product", "Symbol/ISIN", "Amount"]

portfolio[columns_to_subset]

Unnamed: 0,Product,Symbol/ISIN,Amount
0,ADVANCED MICRO DEVICES,US0079031078,2
1,AIRBUS SE,NL0000235190,3
2,ALPHABET INC. - CLASS C,US02079K1079,1
3,AMAZON.COM INC. - COM,US0231351067,1
4,BAE SYS.,GB0002634946,20
...,...,...,...
86,UNILEVER,GB00B10RZP78,3
87,VOLTA FIN,GG00B1GHHH78,600
88,WALT DISNEY COMPANY (T,US2546871060,3
89,XTRACKERS MSCI SINGAPORE UCITS ...,LU0659578842,1200


There are numerous ways to subset rows. The most common way is to create a boolean to filter against the dataframe.

We can also specify a specific value to be selected 

If we want to subset for multiple conditons we can combine operators to do this

In [14]:
portfolio["Value in GBP"] > 5000

In [15]:
portfolio[portfolio["Value in GBP"] > 5000]

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP
8,BLUEFIELD SOLAR,GG00BB0RDB98,7000,133.0,GBX 931000.00,9310.0
23,ES 3400 P 18DEC26,,1,357.0,USD 17850.00,14556.74
24,ES 3400 P 19DEC25,,1,334.5,USD 16725.00,13639.3
34,HEND.FAR EAST,JE00B1GXH751,2000,290.0,GBX 580000.00,5800.0
40,ISH COREFTSE100,IE0005042456,700,732.3,GBX 512610.00,5126.1
41,ISHARES NASDAQ US BIOTECHNOLOGY...,IE00BYXG2H39,1200,4.95,EUR 5943.00,5045.85
44,ISHR ASIA PROP,IE00B1FZS244,500,1966.25,GBX 983125.00,9831.25
47,ISHR EU PROP,IE00B0M63284,400,3013.0,GBX 1205200.00,12052.0
52,ISHR JPM $ EMB,IE00B2NPKV68,70,7266.0,GBX 508620.00,5086.2
61,ISHR UK PROP,IE00B1TXLS18,1500,589.5,GBX 884250.00,8842.5


In [16]:
portfolio[portfolio["Product"] == "BLUEFIELD SOLAR"]

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP
8,BLUEFIELD SOLAR,GG00BB0RDB98,7000,133.0,GBX 931000.00,9310.0


In [17]:
value_5k = portfolio["Value in GBP"] > 5000
shares_100 = portfolio["Amount"] > 100

portfolio[value_5k & shares_100]

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP
8,BLUEFIELD SOLAR,GG00BB0RDB98,7000,133.0,GBX 931000.00,9310.0
34,HEND.FAR EAST,JE00B1GXH751,2000,290.0,GBX 580000.00,5800.0
40,ISH COREFTSE100,IE0005042456,700,732.3,GBX 512610.00,5126.1
41,ISHARES NASDAQ US BIOTECHNOLOGY...,IE00BYXG2H39,1200,4.95,EUR 5943.00,5045.85
44,ISHR ASIA PROP,IE00B1FZS244,500,1966.25,GBX 983125.00,9831.25
47,ISHR EU PROP,IE00B0M63284,400,3013.0,GBX 1205200.00,12052.0
61,ISHR UK PROP,IE00B1TXLS18,1500,589.5,GBX 884250.00,8842.5
62,ISHR US PROP,IE00B1FZSF77,474,2506.0,GBX 1187844.00,11878.44
73,POWERSHS EQQQ,IE0032077012,110,24698.0,GBX 2716780.00,27167.8


In [18]:
#this can also be done in one line of code 

portfolio[ (portfolio["Value in GBP"] > 5000) & (portfolio["Amount"] > 100) ]

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP
8,BLUEFIELD SOLAR,GG00BB0RDB98,7000,133.0,GBX 931000.00,9310.0
34,HEND.FAR EAST,JE00B1GXH751,2000,290.0,GBX 580000.00,5800.0
40,ISH COREFTSE100,IE0005042456,700,732.3,GBX 512610.00,5126.1
41,ISHARES NASDAQ US BIOTECHNOLOGY...,IE00BYXG2H39,1200,4.95,EUR 5943.00,5045.85
44,ISHR ASIA PROP,IE00B1FZS244,500,1966.25,GBX 983125.00,9831.25
47,ISHR EU PROP,IE00B0M63284,400,3013.0,GBX 1205200.00,12052.0
61,ISHR UK PROP,IE00B1TXLS18,1500,589.5,GBX 884250.00,8842.5
62,ISHR US PROP,IE00B1FZSF77,474,2506.0,GBX 1187844.00,11878.44
73,POWERSHS EQQQ,IE0032077012,110,24698.0,GBX 2716780.00,27167.8


The easiest way to see whether values are in the dataframe is to use the **.isin()** method 

In [19]:
is_property = portfolio["Product"].isin(["ISHR EU PROP", "ISHR US PROP", "ISHR ASIA PROP", "ISHR UK PROP"])
                                        

portfolio[is_property]                              

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP
44,ISHR ASIA PROP,IE00B1FZS244,500,1966.25,GBX 983125.00,9831.25
47,ISHR EU PROP,IE00B0M63284,400,3013.0,GBX 1205200.00,12052.0
61,ISHR UK PROP,IE00B1TXLS18,1500,589.5,GBX 884250.00,8842.5
62,ISHR US PROP,IE00B1FZSF77,474,2506.0,GBX 1187844.00,11878.44


# New Columns In Pandas

To add in a new column we just need to state the new name of it. We can do calculations to get new columns that are calcs of others.

In [20]:
portfolio["Value in 000's"] = (portfolio["Value in GBP"] / 1000).round(2)

portfolio

Unnamed: 0,Product,Symbol/ISIN,Amount,Closing,Local value,Value in GBP,Value in 000's
0,ADVANCED MICRO DEVICES,US0079031078,2,95.12,USD 190.24,155.14,0.16
1,AIRBUS SE,NL0000235190,3,106.38,EUR 319.14,270.96,0.27
2,ALPHABET INC. - CLASS C,US02079K1079,1,2330.31,USD 2330.31,1900.38,1.90
3,AMAZON.COM INC. - COM,US0231351067,1,2261.10,USD 2261.10,1843.93,1.84
4,BAE SYS.,GB0002634946,20,741.20,GBX 14824.00,148.24,0.15
...,...,...,...,...,...,...,...
86,UNILEVER,GB00B10RZP78,3,3747.00,GBX 11241.00,112.41,0.11
87,VOLTA FIN,GG00B1GHHH78,600,5.90,EUR 3540.00,3005.60,3.01
88,WALT DISNEY COMPANY (T,US2546871060,3,107.33,USD 321.99,262.58,0.26
89,XTRACKERS MSCI SINGAPORE UCITS ...,LU0659578842,1200,1.27,EUR 1524.48,1294.35,1.29
