# Pandas Inclusive Question Examples
Please, use the nba_players dataset. If you don't have this dataset, you can download it from this link __[Kaggle NBA Players Since 1950](https://www.kaggle.com/datasets/drgilermo/nba-players-stats)__

## Questions you are asked to answer
- Question 1: 
> Get the first 20 record

- Question 2: 
> Get the total number of records

- Question 3: 
> Find the tallest paid player

- Question 4: 
> Find the shortest paid player

- Question 5: 
> Find the total height of all players 

- Question 6: 
> Find the average weight of all players

- Question 7: 
> FFind the name and university of the players who are younger than the average age

- Question 8: 
> Find the collage played by Don Carlson

- Question 9: 
> Find average height and weight information of players by collage

- Question 10: 
> Find average AGE information of players by collage

- Question 11:
> Find out how many different collage

- Question 12:
> Find out how many players play in each collage

- Question 13:
> Find players with 'eb' in their name

- Question 14:
> Find players whose names start with J

o Note:

Please, first try answering it yourself

In [5]:
# import libraries
import numpy as np
import pandas as pd

pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)
# pd.set_option("display.width", 500)

In [6]:
# load dataset
file_path = r"D:\PandasForDataAnalysis\datasets\nba_players.csv"
df = pd.read_csv(file_path)

# check the df 
if df is None:
    raise FileNotFoundError("File Path Not Found. Please Check Your File Path")

df.head()

Unnamed: 0,Player_ID,Player,height,weight,collage,born,birth_city,birth_state
0,0,Curly Armstrong,180.0,77.0,Indiana University,1918.0,,
1,1,Cliff Barker,188.0,83.0,University of Kentucky,1921.0,Yorktown,Indiana
2,2,Leo Barnhorst,193.0,86.0,University of Notre Dame,1924.0,,
3,3,Ed Bartels,196.0,88.0,North Carolina State University,1925.0,,
4,4,Ralph Beard,178.0,79.0,University of Kentucky,1927.0,Hardinsburg,Kentucky


In [7]:
# 1.
first_twenty_rows = df.head(20)
print("*********************** First 20 Records ***********************\n")
print(first_twenty_rows)

*********************** First 20 Records ***********************

    Player_ID            Player  height  weight  \
0           0   Curly Armstrong   180.0    77.0   
1           1      Cliff Barker   188.0    83.0   
2           2     Leo Barnhorst   193.0    86.0   
3           3        Ed Bartels   196.0    88.0   
4           4       Ralph Beard   178.0    79.0   
5           5        Gene Berce   180.0    79.0   
6           6     Charlie Black   196.0    90.0   
7           7       Nelson Bobb   183.0    77.0   
8           8   Jake Bornheimer   196.0    90.0   
9           9      Vince Boryla   196.0    95.0   
10         10         Don Boven   193.0    95.0   
11         11     Harry Boykoff   208.0   102.0   
12         12       Joe Bradley   190.0    79.0   
13         13       Bob Brannum   196.0    97.0   
14         14        Carl Braun   196.0    81.0   
15         15     Frankie Brian   185.0    81.0   
16         16  Price Brookfield   193.0    83.0   
17         17   

In [8]:
# 2.
df_total_records = df.shape[0]
print(f"Total Records: {df_total_records}")

Total Records: 3922


In [19]:
# 3. 
tallest_player_height = df.height.max()
filtered = (df["height"] == tallest_player_height)
shortest_player = df[filtered][["Player_ID","Player"]]
print(f"Tallest Player's Height: {tallest_player_height}\n")
print("Player's Info:\n",shortest_player)

Tallest Player's Height: 231.0

Player's Info:
       Player_ID            Player
1711       1711        Manute Bol
2297       2297  Gheorghe Muresan


In [23]:
# 4
shortest_player_height = df.height.min()
filtered = (df["height"] == shortest_player_height)
shortest_player = df[filtered][["Player_ID","Player"]]
print(f"Shortest Player's Height: {shortest_player_height}\n")
print("Player's Info:\n",shortest_player)



Shortest Player's Height: 160.0

Player's Info:
       Player_ID         Player
1837       1837  Muggsy Bogues


In [24]:
# 5 
total_height = df["height"].sum()
print(f"Total Height: {total_height}")

Total Height: 779122.0


In [None]:
# 6
average_weight = df["weight"].mean()
print(f"Average Weight: {average_weight}")

Average Weight: 94.78321856669217


In [51]:
# 7 add a new column 
import datetime as dt
df["Age"] = dt.datetime.now().year - df["born"]
df.tail()

# isnull
df["Age"].isnull().sum()
df[df["Age"].isnull()] # 223

df.dropna(thresh=4, inplace = True)
df["Age"].isnull().sum() # 0
df.head()

df["Age"] = df["Age"].astype("int64")
df["Age"].dtype 

# print(df["Age"].min()) # 28
# print(df["Age"].mean())

filtered = (df["Age"] < df["Age"].mean())
df[filtered][["Player","collage"]]


Unnamed: 0,Player,collage
1397,Joe Barry,Georgia Institute of Technology
1651,Charles Barkley*,Auburn University
1652,Cory Blackwell,University of Wisconsin
1669,Stuart Gray,"University of California, Los Angeles"
1675,Michael Jordan*,University of North Carolina
1679,Hakeem Olajuwon*,University of Houston
1707,Michael Adams,Boston College
1709,Benoit Benjamin,Creighton University
1712,Mike Brittain,University of South Carolina
1713,Terry Catledge,University of South Alabama


In [52]:
# 8
filtered_name = (df["Player"] == "Don Carlson")
df[filtered_name]["collage"]

23    University of Minnesota
Name: collage, dtype: object

In [64]:
# 9
df.groupby(by = "collage")[["height","weight"]].mean()
print("*"*50)
df.groupby("collage")["Player"].count().sort_values(ascending= False)



**************************************************


collage
University of Kentucky                                        89
University of California, Los Angeles                         86
University of North Carolina                                  67
University of Kansas                                          59
Duke University                                               56
University of Notre Dame                                      51
Indiana University                                            49
Syracuse University                                           49
St. John's University                                         48
University of Louisville                                      46
Michigan State University                                     45
University of Arizona                                         43
University of Minnesota                                       41
Ohio State University                                         39
University of Michigan                                        38
University of Mar

In [69]:
# 10
df.groupby("collage").agg(
    {
        "Player": "count",
        "Age": "mean",
        
    }
).sort_values(by= "Player", ascending= False)

Unnamed: 0_level_0,Player,Age
collage,Unnamed: 1_level_1,Unnamed: 2_level_1
University of Kentucky,89,58.325843
"University of California, Los Angeles",86,59.430233
University of North Carolina,67,60.19403
University of Kansas,59,52.610169
Duke University,56,51.892857
University of Notre Dame,51,67.27451
Indiana University,49,68.469388
Syracuse University,49,53.591837
St. John's University,48,72.854167
University of Louisville,46,60.934783


In [71]:
# 11
unique_collage = df["collage"].unique()
print(f"Unique Collages:\n{unique_collage}\n")
number_of_uniques = df["collage"].nunique()
print(f"Number of unique collages: {number_of_uniques}")

Unique Collages:
['Indiana University' 'University of Kentucky' 'University of Notre Dame'
 'North Carolina State University' 'Marquette University'
 'University of Kansas' 'Temple University' 'Muhlenberg College'
 'University of Denver' 'Western Michigan University'
 "St. John's University" 'Oklahoma State University'
 'Michigan State University' 'Colgate University'
 'Louisiana State University' 'West Texas A&M University'
 'Miami University' nan 'Columbia University'
 'University of Illinois at Urbana-Champaign' 'Seton Hall University'
 'City College of San Francisco' 'University of Minnesota'
 'East Texas State University' 'Canisius College' 'Rice University'
 'University of Wisconsin' 'University of Louisville'
 'Georgetown University' 'University of Wyoming'
 'University of Pennsylvania' 'University of North Carolina'
 'Truman State University' 'New York University' 'University of Colorado'
 'College of William & Mary' 'Butler University'
 'University of Rhode Island' 'Santa Clar

In [None]:
# 12
# Step 1: Count players per collage
collage_counts = df.groupby("collage")["Player"].count()

# Step 2: Filter collages with at least 20 players
collages_with_20_plus = collage_counts[collage_counts >= 20].index

# Step 3: Filter the original df
filtered_df = df[df["collage"].isin(collages_with_20_plus)]

# Step 4: Select columns
filtered_df = filtered_df.loc[:, "Player_ID":"Age"]  # Adjust column names as needed

# Step 5: Display
print(filtered_df)

      Player_ID                   Player  height  weight  \
0             0          Curly Armstrong   180.0    77.0   
1             1             Cliff Barker   188.0    83.0   
2             2            Leo Barnhorst   193.0    86.0   
3             3               Ed Bartels   196.0    88.0   
4             4              Ralph Beard   178.0    79.0   
5             5               Gene Berce   180.0    79.0   
6             6            Charlie Black   196.0    90.0   
7             7              Nelson Bobb   183.0    77.0   
11           11            Harry Boykoff   208.0   102.0   
12           12              Joe Bradley   190.0    79.0   
13           13              Bob Brannum   196.0    97.0   
15           15            Frankie Brian   185.0    81.0   
20           20           Jack Burmaster   190.0    86.0   
21           21             Tommy Byrnes   190.0    79.0   
23           23              Don Carlson   183.0    77.0   
27           27              John Chaney

In [99]:
# 13
eb_filter = (df["Player"].str.contains('eb'))
df_eb_filter = df[eb_filter]
# print(df_eb_filter)

print(f"Total item of df_eb_filter: {df_eb_filter.shape[0]}")
df_eb_filter = df[eb_filter]

Total item of df_eb_filter: 17


In [None]:
# 14
startswith_j_filter = (df["Player"].str.startswith("J")) 
startswith_j_df = df[startswith_j_filter].loc[:, "Player":]
print(startswith_j_df)
# print(f"Count of startswith j: {startswith_j_df.count()}")

                  Player  height  weight  \
8        Jake Bornheimer   196.0    90.0   
12           Joe Bradley   190.0    79.0   
18            Jim Browne   208.0   106.0   
20        Jack Burmaster   190.0    86.0   
25           Jake Carter   193.0    88.0   
27           John Chaney   190.0    83.0   
31          Jack Coleman   201.0    88.0   
34           Jack Cotton   201.0    90.0   
38          Jimmy Darden   185.0    77.0   
42            Joe Dolhon   183.0    79.0   
49        Johnny Ezersky   190.0    79.0   
53       Jerry Fleishman   188.0    86.0   
54            Joe Fulks*   196.0    86.0   
65          Joe Graboski   201.0    88.0   
75           John Hargis   188.0    81.0   
85           Joe Holland   193.0    83.0   
97           Jack Kerris   198.0    97.0   
112           John Logan   188.0    79.0   
115    Johnny Macknowski   183.0    81.0   
116         John Mahnken   203.0    99.0   
117          John Mandic   193.0    92.0   
135         Joe Mullaney   183.0