# Lesson 3 - Using Libraries

Using libraries are vital to using Python, most applications of Python will require you to use libraries and they are probably the most powerful thing that Python offers

All languages have "standard libraries", a group of libraries and functions that are included in the language but default. There are also libraries made by people that you will need to install separately.

There are a variety of ways to access libraries:

In [1]:
import math #imports the entire math library from the standard library

import functools as func #imports tha functools library and renames it to "func"
#this means that to access it we now just type func.XXXX

from math import pi #imports the constant pi from the math library and loads it onto the kernel
pi

3.141592653589793

In [5]:
#you can also get functions using this syntax
from math import exp
exp(1) #raises e to the power of 1

2.718281828459045

### The standard data library stack

You are going to use:
1. Pandas – for handling and creating dataframes
1. Matplotlib – for plotting data
1. Numpy – for tertiary number handling and array generation
1. Statsmodels – for creating models like linear regressions and time series models

I've added a chess games dataset from Kaggle to explore.

I'm going to import Pandas and use it to read the csv data into a dataframe.

In [5]:
import pandas as pd #renaming pandas to pd for quicker coding

df = pd.read_csv("games.csv") 
#calling the read_csv function and giving it the name of the data file

In [8]:
df.head(3)
#this shows us the first three rows of the dataframe so we can get a better look at the data

Unnamed: 0,id,rated,created_at,last_move_at,turns,victory_status,winner,increment_code,white_id,white_rating,black_id,black_rating,moves,opening_eco,opening_name,opening_ply
0,TZJHLljE,False,1504210000000.0,1504210000000.0,13,outoftime,white,15+2,bourgris,1500,a-00,1191,d4 d5 c4 c6 cxd5 e6 dxe6 fxe6 Nf3 Bb4+ Nc3 Ba5...,D10,Slav Defense: Exchange Variation,5
1,l1NXvwaE,True,1504130000000.0,1504130000000.0,16,resign,black,5+10,a-00,1322,skinnerua,1261,d4 Nc6 e4 e5 f4 f6 dxe5 fxe5 fxe5 Nxe5 Qd4 Nc6...,B00,Nimzowitsch Defense: Kennedy Variation,4
2,mIICvQHh,True,1504130000000.0,1504130000000.0,61,mate,white,5+10,ischia,1496,a-00,1500,e4 e5 d3 d6 Be3 c6 Be2 b5 Nd2 a5 a4 c5 axb5 Nc...,C20,King's Pawn Game: Leonardis Variation,3


#### Pandas
This is where you're going to have to start using the docs and internet to figure a few things out for yourself. I am going to give you a few examples and you can ask me questions etc but there's no way I can include everything here so I am just going to include some notable examples.

You can index pandas dataframes using the [] notation just like arrays, but the data input is the string name of the column. For example if you wanted to create a dataframe of only the victory conditions you could use:

In [25]:
df["victory_status"]

0        outoftime
1           resign
2             mate
3             mate
4             mate
           ...    
20053       resign
20054         mate
20055         mate
20056       resign
20057         mate
Name: victory_status, Length: 20058, dtype: object

And if you wanted to create a dataframe that shows when the victory status is a checkmate or not you can just attach a conditional operator.

In [26]:
df["victory_status"] == "mate"

0        False
1        False
2         True
3         True
4         True
         ...  
20053    False
20054     True
20055     True
20056    False
20057     True
Name: victory_status, Length: 20058, dtype: bool

Say you wanted to select all the rows where the victory condition was to resign for a certain analysis, to do this you would use the pandas conditional index syntax. This basically combines the True/False dataframe from before to only select those certain indices in the new dataframe.

df[(conditional statement)]

Remember, the conditional statement could be as simple as checking if the data is a certain value but you could also use the more than or less than operators and you could also string together multiple conditional statements.

In [30]:
resign_df = df[(df["victory_status"] == "resign")]
resign_df.head(5)

Unnamed: 0,id,rated,created_at,last_move_at,turns,victory_status,winner,increment_code,white_id,white_rating,black_id,black_rating,moves,opening_eco,opening_name,opening_ply
1,l1NXvwaE,True,1504130000000.0,1504130000000.0,16,resign,black,5+10,a-00,1322,skinnerua,1261,d4 Nc6 e4 e5 f4 f6 dxe5 fxe5 fxe5 Nxe5 Qd4 Nc6...,B00,Nimzowitsch Defense: Kennedy Variation,4
6,qwU9rasv,True,1504230000000.0,1504230000000.0,33,resign,white,10+0,capa_jr,1520,daniel_likes_chess,1423,d4 d5 e4 dxe4 Nc3 Nf6 f3 exf3 Nxf3 Nc6 Bb5 a6 ...,D00,Blackmar-Diemer Gambit: Pietrowsky Defense,10
7,RVN0N3VK,False,1503680000000.0,1503680000000.0,9,resign,black,15+30,daniel_likes_chess,1413,soultego,2108,e4 Nc6 d4 e5 d5 Nce7 c3 Ng6 b4,B00,Nimzowitsch Defense: Kennedy Variation | Link...,5
8,dwF3DJHO,True,1503510000000.0,1503510000000.0,66,resign,black,15+0,ehabfanri,1439,daniel_likes_chess,1392,e4 e5 Bc4 Nc6 Nf3 Nd4 d3 Nxf3+ Qxf3 Nf6 h3 Bc5...,C50,Italian Game: Schilling-Kostic Gambit,6
11,Vf5fKWzI,False,1503350000000.0,1503350000000.0,38,resign,black,20+60,daniel_likes_chess,1381,subham777,1867,e4 e6 d4 d5 e5 c5 c3 Nc6 Nf3 Qb6 Be3 Qxb2 Nbd2...,C02,French Defense: Advance Variation | Paulsen A...,9


As you can see this has only returned rows where the victory condition is a resignation. If you were only interested in games that were rated and got to the late game (say 50 turns) you could do it like this:

In [37]:
rated_lategame = df[(df["rated"] == True and df["turns"] >= 50)]
rated_lategame.head(5)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [39]:
turns_victory_df = df.groupby("rated").sum()
turns_victory_df["turns"]

rated
False     211822
True     1001005
Name: turns, dtype: int64

In [49]:
import matplotlib.pyplot as plt
