In [1]:
from nba_api.stats.static import teams
from nba_api.stats.endpoints import leaguegamefinder
from elasticsearch import Elasticsearch, helpers
from getpass import getpass

You will want to get the team data from the NBA team static dataset, which has an ID for each team. You can use a list comprehension to find the team with the abbreviation of BOS for Boston. Once you get the complete Celtics object, you can narrow it down to just the ID, which you can use to find game data. 

In [2]:
nba_teams = teams.get_teams()
celtics = [team for team in nba_teams if team['abbreviation'] == 'BOS'][0]
celtics_id = celtics['id']

Now, you can use the Celtics' ID to get all the available game data for the team. You view the first five results to ensure the data was correctly loaded using the `.head()` method. 

In [3]:
gamefinder = leaguegamefinder.LeagueGameFinder(team_id_nullable=celtics_id)
games = gamefinder.get_data_frames()[0]
games.head()

Unnamed: 0,SEASON_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,PTS,...,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PLUS_MINUS
0,22023,1610612738,BOS,Boston Celtics,22300611,2024-01-22,BOS @ DAL,W,240,119,...,0.733,5,39,44,26,7,8,6,20,9.0
1,22023,1610612738,BOS,Boston Celtics,22300603,2024-01-21,BOS @ HOU,W,240,116,...,0.722,16,39,55,31,10,12,16,17,9.0
2,22023,1610612738,BOS,Boston Celtics,22300586,2024-01-19,BOS vs. DEN,L,240,100,...,0.714,12,26,38,21,5,2,2,17,-2.0
3,22023,1610612738,BOS,Boston Celtics,22300571,2024-01-17,BOS vs. SAS,W,240,117,...,0.789,12,42,54,22,5,6,12,15,19.0
4,22023,1610612738,BOS,Boston Celtics,22300563,2024-01-15,BOS @ TOR,W,240,105,...,0.92,5,48,53,20,3,7,12,16,9.0


While working with this data, I noticed that data for the year includes pre-season data. So, I used the season dates to narrow the data down to the current season. In a Jupyter Notebook, you can call current_season to view the full DataFrame. 

In [4]:
current_season = games.loc[(games['GAME_DATE'] >= '2023-10-24') & (games['GAME_DATE'] <= '2024-06-20')]
current_season

Unnamed: 0,SEASON_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,PTS,...,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PLUS_MINUS
0,22023,1610612738,BOS,Boston Celtics,22300611,2024-01-22,BOS @ DAL,W,240,119,...,0.733,5,39,44,26,7,8,6,20,9.0
1,22023,1610612738,BOS,Boston Celtics,22300603,2024-01-21,BOS @ HOU,W,240,116,...,0.722,16,39,55,31,10,12,16,17,9.0
2,22023,1610612738,BOS,Boston Celtics,22300586,2024-01-19,BOS vs. DEN,L,240,100,...,0.714,12,26,38,21,5,2,2,17,-2.0
3,22023,1610612738,BOS,Boston Celtics,22300571,2024-01-17,BOS vs. SAS,W,240,117,...,0.789,12,42,54,22,5,6,12,15,19.0
4,22023,1610612738,BOS,Boston Celtics,22300563,2024-01-15,BOS @ TOR,W,240,105,...,0.92,5,48,53,20,3,7,12,16,9.0
5,22023,1610612738,BOS,Boston Celtics,22300542,2024-01-13,BOS vs. HOU,W,239,145,...,0.76,10,40,50,26,7,8,11,21,32.0
6,22023,1610612738,BOS,Boston Celtics,22300528,2024-01-11,BOS @ MIL,L,241,102,...,0.833,6,25,31,22,5,5,6,13,-33.0
7,22023,1610612738,BOS,Boston Celtics,22300517,2024-01-10,BOS vs. MIN,W,263,127,...,0.968,7,39,46,21,5,4,8,19,7.0
8,22023,1610612738,BOS,Boston Celtics,22300507,2024-01-08,BOS @ IND,L,240,131,...,0.69,13,29,42,26,3,8,15,20,-2.0
9,22023,1610612738,BOS,Boston Celtics,22300493,2024-01-06,BOS @ IND,W,240,118,...,0.526,13,43,56,26,6,6,17,20,17.0


Since null values can create issues when loading your data to Elasticsearch, you can double-check that this data has none. The line below returns a boolean value, letting you know if your data has any null values. Since this dataset returns a value of False, it has no null values, so we don't have to do further cleaning. 

In [5]:
current_season.isnull().values.any()

False

## Step 2: Loading Boston Celtics data into Elasticsearch

After parsing the NBA data and finishing cleaning your data, you can create variables for your Elastic Cloud ID and Elastic API Key. Using `getpass`, you can securely enter your credentials.

In [6]:
elastic_cloud_id = getpass("Elastic Cloud ID: ")
elastic_api_key = getpass("API Key: ")

Elastic Cloud ID:  ········
API Key:  ········


Once you've entered your credentials, you can connect to Elasticsearch using the [elasticsearch client](https://elasticsearch-py.readthedocs.io/en/v8.11.1/).

In [7]:
es = Elasticsearch(cloud_id=elastic_cloud_id, api_key=elastic_api_key)

Before you can load your data into Elastic, you must create an index. You can create one for the current season. 

In [9]:
es.indices.create(index="demo3_practice_current_season")

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'demo3_practice_current_season'})

You can create a function to load data from the current season into Elasticsearch. Each game is considered a document. 

In [10]:
timeframe = 'demo3_practice_current_season'

def doc_generator(df, timeframe):
    for index, document in df.iterrows():
        yield {
            "_index": timeframe, 
            "_id": f"{document['GAME_ID']}",
            "_source": document.to_dict(),
        }

The `helpers` feature of the Python client allows you to efficiently upload your DataFrame, which holds data on the current season's games, into Elasticsearch. By calling the doc_generator function you just created, you can convert your DataFrame into documents.

In [11]:
helpers.bulk(es, doc_generator(current_season, timeframe))

(44, [])

## Step 3: Writing queries with Elasticsearch

Now that your data is loaded, you can start writing queries with Elasticsearch to learn more about how the Boston Celtics perform this season. First, you can create a query to see how many wins they have had so far this season and return the count of wins as a result. 

In [14]:
search_query = {
    "query": {
        "match": {
            "WL": "W"
        }
    }
}

games_won = es.count(index="demo3_practice_current_season", body=search_query)

While working with complex datasets, writing sentences to help explain the dataset is sometimes helpful. Here is one example of how many games the Boston Celtics have won this season.

In [15]:
print(f"The Celtics won {games_won['count']} games this season so far.")

The Celtics won 34 games this season so far.


A streak in sports refers to a consecutive series of games or events in which a team or individual consistently wins or loses. Streaks are significant because they reflect a period of either exceptional performance (winning streak) or a challenging phase (losing streak). While analyzing how well a team is performing, examining how many streaks they have is often helpful. You can create a query that allows you to sort the wins and losses by game data. 

In [16]:
streak_query = {
  "size": 1000,  
  "sort": [
    {
      "GAME_DATE": {
        "order": "asc"
      }
    }
  ],
  "_source": ["GAME_DATE", "WL"]
}

You can use the `es.search()` method to create a search based on the query above.

In [17]:
streak_search = es.search(
    index="demo_practice_current_season",
    body=streak_query)

The following code creates a JSON object of game date with the game result.

In [18]:
gs = [hit['_source'] for hit in streak_search['hits']['hits']]

To view the top five streaks of the season, you can create a dictionary of each streak and sort it accordingly. 

In [19]:
streaks = []
current_streak = 1
for i in range(1, len(gs)):
    if gs[i]['WL'] == gs[i-1]['WL']:
        current_streak += 1
    else:
        streaks.append((gs[i-1]['WL'], current_streak))
        current_streak = 1


streaks.append((gs[-1]['WL'], current_streak))
top_streaks = sorted(streaks, key=lambda x: x[1], reverse=True)[:5]
print(top_streaks)

[('W', 6), ('W', 6), ('W', 5), ('W', 5), ('W', 3)]
