Welcome to my NBA analysis. If you like it, don't forget to upvote. Thank you!😁

![](https://cdn.pixabay.com/photo/2016/11/29/03/12/back-view-1867001_1280.jpg)

# 1-Exploration of the dataset

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px


route='../input/nba-injuries-2010-2018/injuries_2010-2020.csv'

data=pd.read_csv(route)

data.info()

In [None]:
data.head()

* Date: date when the player got hurt.
* Acquired: indicates if the player returned to the team or leaved the team because he needed a major medical procedure.
* Relinquished: the name of the player injured.
* Notes: description of the injury.

I'm going to drop the Acquired column because it's not useful for my analysis and drop the rows without value in Relinquished, because I only want to keep the rows with a player associated. 

In [None]:
data=data.drop('Acquired',axis=1)
data=data[data.Relinquished.notnull()]

In [None]:
data.info()

Fill the NaN values in Team with 'No team data available'.

In [None]:
data.update(data['Team'].fillna(value='No team available', inplace=True))

In [None]:
data.head()

Let's add the columns 'Year' and 'Month' to the dataset.

In [None]:
data['year'] = pd.DatetimeIndex(data['Date']).year
data['month'] = pd.DatetimeIndex(data['Date']).month

In [None]:
data.head()

In [None]:
data['year']=data.year.astype(str)
data['month']=data.month.astype(str)

Now I have the dataset that I wanted 👍

# 2-Data visualization

## A-Which teams had more injuries from 2010 to 2020?

In [None]:
fig = px.histogram(data, x="Team")
fig.update_layout(
   paper_bgcolor='rgb(255,255,255)',
   plot_bgcolor='rgb(255,255,255)',
    font_family="Helvetica",
    font_color="gray",
    title_font_family="Helvetica",
    title_font_color="gray",
    legend_title_font_color="gray",
    xaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    },
    yaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    }
    
)
fig.show()

I'm going to count as Charlotte Hornets' injuries the injuries from the team Bobcats, because Bobcats were the name of the Charlotte Hornets until 2014. 

In [None]:
data['Team'] = np.where((data.Team == 'Bobcats'), 'Hornets', data.Team)

In [None]:
#do the plot again with this matter fixed
fig = px.histogram(data, x="Team")
fig.update_layout(
   paper_bgcolor='rgb(255,255,255)',
   plot_bgcolor='rgb(255,255,255)',
    font_family="Helvetica",
    font_color="gray",
    title_font_family="Helvetica",
    title_font_color="gray",
    legend_title_font_color="gray",
    xaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    },
    yaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    }
    
)
fig.show()

The four teams with more injuries were:

* Charlotte Hornets.
* Milwaukee Bucks.
* San Antonio Spurs.
* Houston Rockets.

If you want to check the teams with more injuries during a certain year,do this:

In [None]:
#for example, during 2020
data2020=data[data.year==2020]

fig = px.histogram(data2020, x="Team")
fig.update_layout(
   paper_bgcolor='rgb(255,255,255)',
   plot_bgcolor='rgb(255,255,255)',
    font_family="Helvetica",
    font_color="gray",
    title_font_family="Helvetica",
    title_font_color="gray",
    legend_title_font_color="gray",
    xaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    },
    yaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    }
    
)
fig.show()

Kudos to Miami Heat for reaching the finals in spite of being the team with more injuries. An amazing team with an incredible administration by Pat Riley, which achieved Tyler Herro and Duncan Robinson. 

# B-Players with more injuries during 2020.

In [None]:
moreThanFourInjuries=data2020.groupby("Relinquished").filter(lambda x: len(x) > 4)

In [None]:
fig = px.histogram(moreThanFourInjuries,x="Relinquished")
fig.update_layout(
   paper_bgcolor='rgb(255,255,255)',
   plot_bgcolor='rgb(255,255,255)',
    font_family="Helvetica",
    font_color="gray",
    title_font_family="Helvetica",
    title_font_color="gray",
    legend_title_font_color="gray",
    xaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    },
    yaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    }
    
)
fig.show()

The three players with more injuries during 2020 were:

* Gabe Vincent.
* Russell Westbrook.
* Draymond Green.

# C-Years with more players injured

In [None]:
dataYear=data.groupby('year').size().reset_index(name='count')
dataYear=dataYear.sort_values('count', ascending=False)

fig = px.bar(dataYear,x='year',y='count')
fig.update_layout(
   paper_bgcolor='rgb(255,255,255)',
   plot_bgcolor='rgb(255,255,255)',
    font_family="Helvetica",
    font_color="gray",
    title_font_family="Helvetica",
    title_font_color="gray",
    legend_title_font_color="gray",
    xaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    },
    yaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    }
    
)
fig.show()

# D-Which types of injuries are most frequent?

In [None]:
from collections import Counter
import nltk

stopwords = nltk.corpus.stopwords.words('english')
# Regular expressions for stopwords
RE_stopwords = r'\b(?:{})\b'.format('|'.join(stopwords))

words = (data.Notes
           .str.lower()
           .str.cat(sep=' ')
           .split()
)
#drop all stopwrods 
l=[]

for i in words:
    if i not in RE_stopwords:
        l.append(i)

mostFrequentWordsInNotes = pd.DataFrame(Counter(l).most_common(20),
                    columns=['Word', 'Frequency'])




In [None]:
fig = px.bar(mostFrequentWordsInNotes, x='Word', y='Frequency')
fig.update_layout(
   paper_bgcolor='rgb(255,255,255)',
   plot_bgcolor='rgb(255,255,255)',
    font_family="Helvetica",
    font_color="gray",
    title_font_family="Helvetica",
    title_font_color="gray",
    legend_title_font_color="gray",
    xaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    },
    yaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    }
    
)
fig.show()

* DNP: in the NBA, it means Did Not Play.
* DTD: Day to Day.

If we look at the plot, it seems like knee and ankle injuries are the most frequent. 

# E-Type of injuries of a certain player.

In [None]:
data.Relinquished.unique()

In [None]:
westbrook=data[data.Relinquished=='Russell Westbrook']

stopwords = nltk.corpus.stopwords.words('english')
# Regular expressions for stopwords
RE_stopwords = r'\b(?:{})\b'.format('|'.join(stopwords))

words = (westbrook.Notes
           .str.lower()
           .str.cat(sep=' ')
           .split()
)
#drop all stopwrods 
l=[]

for i in words:
    if i not in RE_stopwords:
        l.append(i)

mostFrequentWordsInNotes = pd.DataFrame(Counter(l).most_common(10),
                    columns=['Word', 'Frequency'])


In [None]:
fig = px.bar(mostFrequentWordsInNotes, x='Word', y='Frequency')
fig.update_layout(
   paper_bgcolor='rgb(255,255,255)',
   plot_bgcolor='rgb(255,255,255)',
    font_family="Helvetica",
    font_color="gray",
    title_font_family="Helvetica",
    title_font_color="gray",
    legend_title_font_color="gray",
    xaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    },
    yaxis = { 
    'showgrid': False, 
    'zeroline': True, 
    'visible': True,
    
    }
    
)
fig.show()

Westbrook had 12 knee-related events, and 6 surgery-related events. 