# Tabular Gathering

### NBA combine data

This data was gathered from Kaggle and represents the NBA draft combine testing results from the year 2000 to 2022. The NBA combine is an event in which NBA draft prospects are tested and measured on a number of physical and athletic attributes, including their standing vertical jump and their running vertical jump (max vertical jump). It is one of the few events in sports in which players' physical attributes and athletic abilities are measured in an official and objective manner. I selected this dataset as it will be useful to measure whether there are other physical predictors for vertical jump, as well as for merging with season and game data to see whether vertical jump has an affect on in game performance.

[Source Link](https://www.kaggle.com/datasets/marcusfern/nba-draft-combine)

[Link to dataset](https://github.com/anly501/dsan-5000-project-thm12/blob/main/data/00-raw-data/NBA_Draft_Combine.csv)

![](images/nba_combine.png)

### NBA player season data

This data was gathered from an API using R. The data represents NBA player data for each season from 2009 to 2023. It includes averages and totals from stats like points, rebounds, assists, and other game stats. I included this dataset as I plan to merge it with the previous NBA combine data to see if vertical jump has any relation to in game success for certain stats. I obtained a dataset with every players stats from each season by looping through years and querying the data from each season and then adding it to a single dataset. 

[Source Link](https://sportsdata.io/developers/api-documentation/nba#/sports-data-feeds)

[Link to dataset](https://github.com/anly501/dsan-5000-project-thm12/blob/main/data/00-raw-data/api_nba_player_season.csv)

[Link to code](https://github.com/anly501/dsan-5000-project-thm12/blob/main/codes/01-data-gathering/data_gathering%26cleaning.Rmd)

![](images/nba_api.png)



### NFL combine data

This data was gathered from Kaggle and represents the NFL combine testing results from the year 2000 to 2018. The NFL combine is the same concept as the NBA combine, and is actually much more well known and covered than the NBA combine, as many positions in football rely heavily on physical attributes and abilities. I selected this dataset to compare the numbers to the NBA combine and compare how vertical jump for basketball players compares to other athletes, and the NFL is the only other league with a widely covered combine.

[Source Link](https://www.kaggle.com/datasets/savvastj/nfl-combine-data)

[Link to dataset](https://github.com/anly501/dsan-5000-project-thm12/blob/main/data/00-raw-data/nfl_combine.csv)

![](images/nfl.png)

### Olympic Track and Field data

This data was also gathered from Kaggle and represents the results from all Olympic track & field events from 1896 to 2016. I plan to specifically look at the data from the high jump event. While the measurement system is different for high jumping, I would still like to examine how results have changed over time, and if any insights can be realized specifically with nationality.

[Source Link](https://www.kaggle.com/datasets/jayrav13/olympic-track-field-results)

[Link to dataset](https://github.com/anly501/dsan-5000-project-thm12/blob/main/data/00-raw-data/olympic_track.csv)

![](images/olympic.png)

### Stretching Study data

This dataset comes from a study that looked at the the effect of dynamic stretching on vertical jump height in recreational athletes and collegiate students. It includes key metrics on the participants and their vertical results from before and after dynamic stretching. This will be useful to look at the effect that dynamic stretching has on vertical jump and if other factors with the participants play any role.

[Source Link](https://data.mendeley.com/datasets/z7dbpvn64g/1)

[Link to dataset](https://github.com/anly501/dsan-5000-project-thm12/blob/main/data/00-raw-data/stretching.csv)

![](images/stretching.png)



# Text Gathering

### News API text data

This text data was queried from the News API and contains articles from the News API based on NBA player names from the NBA combine dataset. I created a funtion to query data based on the player name and then applied it to a new column in a cleaned version of the nba combine dataset. I saved it in csv format and .pkl format to preserve the structure of each query for cleaning. The code used to query the data is below.

[Source Link](https://newsapi.org/docs)

[Link to datset](https://github.com/anly501/dsan-5000-project-thm12/blob/main/dsan-website/5000-website/raw_text.txt)

In [None]:
#| code-fold: true
import requests
import json
import re
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
import joblib

combine_df = pd.read_csv("../../data/01-modified-data/cleaned_NBA_combine.csv")

baseURL = "https://newsapi.org/v2/everything?"
total_requests=2
verbose=True

API_KEY='0e1b77cce9164a668886dca65fd25285'


#Query
def fetch_data(TOPIC):
    URLpost = {'apiKey': API_KEY,
            'q': '+'+TOPIC,
            'sortBy': 'relevancy',
            'totalRequests': 1}
    response = requests.get(baseURL, URLpost) 
    response = response.json() 
    return response

#commented out to avoid going over api query limtit
#combine_df["uncleaned_news_text"]= combine_df['Name'].apply(fetch_data)

combine_df.to_pickle("my_df.pkl")
combine_df.to_csv("../../data/00-raw-data/raw_text.csv")