<a href="https://colab.research.google.com/github/OptimalDecisions/sports-analytics-foundations/blob/main/pandas-basics/Pandas_Basics_2_2_Reading_Files.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 ## Pandas Basics 2.2



  # Reading Data from files

  <img src = "../img/sa_logo.png" width="100" align="left">

  Ram Narasimhan

  <br><br><br>

  <<   [2.1 Data Structures](Pandas_Basics_2_1_Data_Structures.ipynb)|   [2.2 Reading Data from files](Pandas_Basics_2_2_Reading_Files.ipynb) |    [2.3 Examining, Describing & Summarizing Data](Pandas_Basics_2_3_Exploring_Data.ipynb) >>



## Importing Commonly used modules

For every Pandas notebook, you will almost always need several accompanying modules. So it is a good idea to import everything in one go. (These commands can be used at the start of all your notebooks. matplotlib and Seaborn are plotting packages.)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline


## `read_csv` - Read from CSV files

In [6]:
import os

print(os.getcwd())

/content


In [13]:
url = "https://raw.githubusercontent.com/OptimalDecisions/sports-analytics-foundations/main/data/2022-2023%20NBA%20Player%20Stats%20-%20Regular.csv"
df = pd.read_csv(url)


UnicodeDecodeError: ignored

This dataset comes from Kaggle, and it has a few characters that is giving as an `UnicodeDecodeError`.

To fix such errors, we usually have to change the "Encoding." Also, if we look at our file, we see that the entries are separated by semicolons(;) and not commas! So let's fix that.


In [16]:
df = pd.read_csv(url, encoding = "ISO-8859-1", sep=";")


In [17]:
df

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Precious Achiuwa,C,23,TOR,55,12,20.7,3.6,7.3,...,0.702,1.8,4.1,6.0,0.9,0.6,0.5,1.1,1.9,9.2
1,2,Steven Adams,C,29,MEM,42,42,27.0,3.7,6.3,...,0.364,5.1,6.5,11.5,2.3,0.9,1.1,1.9,2.3,8.6
2,3,Bam Adebayo,C,25,MIA,75,75,34.6,8.0,14.9,...,0.806,2.5,6.7,9.2,3.2,1.2,0.8,2.5,2.8,20.4
3,4,Ochai Agbaji,SG,22,UTA,59,22,20.5,2.8,6.5,...,0.812,0.7,1.3,2.1,1.1,0.3,0.3,0.7,1.7,7.9
4,5,Santi Aldama,PF,22,MEM,77,20,21.8,3.2,6.8,...,0.750,1.1,3.7,4.8,1.3,0.6,0.6,0.8,1.9,9.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
674,535,Thaddeus Young,PF,34,TOR,54,9,14.7,2.0,3.7,...,0.692,1.3,1.8,3.1,1.4,1.0,0.1,0.8,1.6,4.4
675,536,Trae Young,PG,24,ATL,73,73,34.8,8.2,19.0,...,0.886,0.8,2.2,3.0,10.2,1.1,0.1,4.1,1.4,26.2
676,537,Omer Yurtseven,C,24,MIA,9,0,9.2,1.8,3.0,...,0.833,0.9,1.7,2.6,0.2,0.2,0.2,0.4,1.8,4.4
677,538,Cody Zeller,C,30,MIA,15,2,14.5,2.5,3.9,...,0.686,1.7,2.6,4.3,0.7,0.2,0.3,0.9,2.2,6.5


It is common to find that our data is in Excel files. If that happens, pandas has a ready-made command to help us read the data.


## `read_excel` - Read directly from Excel files

The read_excel convenience function in pandas imports a specific sheet from an Excel file.

📌 Note: Since Excel files could contain multiple "sheets", we might have to specify the Sheet that we want Pandas to read.


In [31]:
# URL of the Excel file
excel_url = 'https://github.com/OptimalDecisions/sports-analytics-foundations/raw/main/data/2022-2023%20NBA%20Player%20Stats%20-%20Playoffs.xlsx'



In [35]:

# Read the Excel file into a DataFrame
df_nba_stats = pd.read_excel(excel_url, sheet_name=0)

# Display the first few rows of the DataFrame
print(df_nba_stats.head())

  Rk;Player;Pos;Age;Tm;G;GS;MP;FG;FGA;FG%;3P;3PA;3P%;2P;2PA;2P%;eFG%;FT;FTA;FT%;ORB;DRB;TRB;AST;STL;BLK;TOV;PF;PTS
0  1;Bam Adebayo;C;25;MIA;23;23;37;7.3;15.1;0.481...                                                              
1  2;Santi Aldama;PF;22;MEM;6;0;16.8;2.5;5.5;0.45...                                                              
2  3;Nickeil Alexander-Walker;SG;24;MIN;5;4;29.6;...                                                              
3  4;Grayson Allen;SG;27;MIL;5;5;29.8;3.8;8.2;0.4...                                                              
4  5;Jarrett Allen;C;24;CLE;5;5;38.2;4.4;7.2;0.61...                                                              


  << [2.1 Data Structures](Pandas_Basics_2_1_Data_Structures.ipynb) | [2.2 Reading Data from files](Pandas_Basics_2_2_Reading_Files.ipynb) | [2.3 Examining, Describing & Summarizing Data](Pandas_Basics_2_3_Exploring_Data.ipynb) >>
