# Data Manipulation in Python
Whether you want to be a data scientist, data engineer, or just automate some boring tasks with python, you'll spend a large portion of of your time manipulating data.<br>

Data manipulation can take many forms, but in essence it is converting data from one format to another. Think about getting a monthly report of sales numbers from your boss. Your company is global, but you're tasked with finding the sales rep with the most sales in North America. What steps do you need to do to find this information? Maybe something like this:

1. Filter the file for the North America Region
2. Add a look-up of distinct sales reps
3. Add a sumif to determine the total sales by sales rep
4. Sort the total sales from highest to lowest.

All of these steps are manipulating the raw data to get the final answer. <br>

Section 03 showed how to load a csv file into python. When the data is loaded in using pandas, it is loaded into a special data structure - a dataframe. A data frame is a table structure made up of rows and columns. Much like an excel table, a column typically represents an attribute or variable while a row represents a record: <br>

| Rep ID | Rep Name | Month | Sales (units)|
| --- | --- | --- | --- | 
| 1aa | Joe | January | 100 |
| 1aa | Joe | February | 200 |
| 1bb | John | January | 150 |
| 1bb | John | February | 150 |

In this table, we have 4 records & 4 columns describing each record. <br>

Pandas is a very popular package, so there is great documentation online & plenty of questions posted on Stack Overflow. There are other data structures that can store data in a similar way - dictionary, matrix, parquet, other. Choosing one is dependent on your programs needs. In general, a pandas dataframe is a good place to start. <br>

We'll cover these common data manipulation action: 

- Select / Drop
- Filter
- Distinct
- Order By
- Mutate
- Group By & Summarize
- Merge
- Append / Union

Let's load the data and get started. 

In [17]:
# Import pandas
import pandas as pd

# Define the file path
file_path = '../../2022-fall-python-tutorial/data/2022_boxscores.csv' # '../..' helps python to find the file

# Load file
df = pd.read_csv(file_path)

# Print first 3 rows of data
df.head(3)

Unnamed: 0,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,50.0,11,15.4,6,98.6,51.3,20,0.421,57,0.386,...,17,0.0,0,"Jon M. Huntsman Center, Salt Lake City, Utah",ABILENE-CHRISTIAN,Abilene Christian,71.0,Home,UTAH,Utah
1,68.8,22,0.0,0,101.3,67.7,21,0.507,73,0.438,...,15,0.0,0,"Reed Arena, College Station, Texas",ABILENE-CHRISTIAN,Abilene Christian,64.3,Home,TEXAS-AM,Texas A&M
2,59.1,13,0.0,0,86.6,76.7,23,0.381,67,0.328,...,18,0.0,0,"College Park Center, Arlington, Texas",TEXAS-ARLINGTON,UT Arlington,72.6,Away,ABILENE-CHRISTIAN,Abilene Christian


### Selecting / Dropping Data
After loading the boxscore dataframe, notice on the far left side that there is a number on each row 0, 1, 2, ... This is called the **index** - this helps python perform data manipulation functions efficiently & also allows you to select certain rows of data. <br>

**Select** a single column of a dataframe using: 
- df['column_name']

**Select** multiple columns of a dataframe using:
- df[['column_name_1', 'column_name_2']]

Notice that selecing a single column by name has a single '[' bracket while selecing multiple columns uses multiple brackets '[['. 

In [18]:
# Select all of the winning names in df
print('Winning Names: ')
df['winning_name'].head(3)

# Select all of the winning names AND losing names in df
print('Winning Names AND Losing Names: ')
df[['winning_name', 'losing_name']].head(3)

Winning Names: 
Winning Names AND Losing Names: 


Unnamed: 0,winning_name,losing_name
0,Utah,Abilene Christian
1,Texas A&M,Abilene Christian
2,Abilene Christian,UT Arlington


**Select** a single row of a dataframe using: 
- df.iloc[row_num]

**Select** multiple rows of a dataframe using: 
- df.iloc[row_start:row_end]

The *row_end* is not inclusive, meaning that I'm selecting rows up to **not** including that row index added. 

**Important** - the row index starts at 0 by default! Always remember that when selecting rows. 

In [19]:
# Select 1st row of data
print('1st row of data: ')
df.iloc[0]

# Select 2nd & 3rd rows of data
print('2nd & 3rd rows of data: ')
df.iloc[1:3] # remember that 1 is the second row - index starts at 0. 

1st row of data: 
2nd & 3rd rows of data: 


Unnamed: 0,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
1,68.8,22,0.0,0,101.3,67.7,21,0.507,73,0.438,...,15,0.0,0,"Reed Arena, College Station, Texas",ABILENE-CHRISTIAN,Abilene Christian,64.3,Home,TEXAS-AM,Texas A&M
2,59.1,13,0.0,0,86.6,76.7,23,0.381,67,0.328,...,18,0.0,0,"College Park Center, Arlington, Texas",TEXAS-ARLINGTON,UT Arlington,72.6,Away,ABILENE-CHRISTIAN,Abilene Christian


**Drop** a column from a dataframe using the 'drop' function: 
- df.drop([column_name_1], inplace = True, axis = 1)

Setting inplace = True will manipulate the dataframe without forcing you to define a new dataframe. Setting axis = 1 indicates that a column should be dropped. Alternatively, you can use: 
- df_new = df.drop([column_name_1], axis = 1)

In [20]:
# Drop the away_assist_percentage column
df.drop(['away_assist_percentage'], inplace=True, axis=1)
df.head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,11,15.4,6,98.6,51.3,20,0.421,57,0.386,22,...,17,0.0,0,"Jon M. Huntsman Center, Salt Lake City, Utah",ABILENE-CHRISTIAN,Abilene Christian,71.0,Home,UTAH,Utah
1,22,0.0,0,101.3,67.7,21,0.507,73,0.438,32,...,15,0.0,0,"Reed Arena, College Station, Texas",ABILENE-CHRISTIAN,Abilene Christian,64.3,Home,TEXAS-AM,Texas A&M
2,13,0.0,0,86.6,76.7,23,0.381,67,0.328,22,...,18,0.0,0,"College Park Center, Arlington, Texas",TEXAS-ARLINGTON,UT Arlington,72.6,Away,ABILENE-CHRISTIAN,Abilene Christian


In [21]:
df.drop(['away_assists'], axis=1)
df.head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,11,15.4,6,98.6,51.3,20,0.421,57,0.386,22,...,17,0.0,0,"Jon M. Huntsman Center, Salt Lake City, Utah",ABILENE-CHRISTIAN,Abilene Christian,71.0,Home,UTAH,Utah
1,22,0.0,0,101.3,67.7,21,0.507,73,0.438,32,...,15,0.0,0,"Reed Arena, College Station, Texas",ABILENE-CHRISTIAN,Abilene Christian,64.3,Home,TEXAS-AM,Texas A&M
2,13,0.0,0,86.6,76.7,23,0.381,67,0.328,22,...,18,0.0,0,"College Park Center, Arlington, Texas",TEXAS-ARLINGTON,UT Arlington,72.6,Away,ABILENE-CHRISTIAN,Abilene Christian


Notice in the above code that the 'away_assists' column is still in the dataframe. This is due to the inplace argument. <br>

To drop a row from a dataframe, the same inplace logice applies, however, set axis = 0: 
- df.drop([1], inplace = True, axis = 0)

In the example below, notice that the row with the index = 1 is now removed from the dataframe. 

In [22]:
# Drop the second row (index = 1)
df.drop([1], axis = 0, inplace = True)
df.head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,11,15.4,6,98.6,51.3,20,0.421,57,0.386,22,...,17,0.0,0,"Jon M. Huntsman Center, Salt Lake City, Utah",ABILENE-CHRISTIAN,Abilene Christian,71.0,Home,UTAH,Utah
2,13,0.0,0,86.6,76.7,23,0.381,67,0.328,22,...,18,0.0,0,"College Park Center, Arlington, Texas",TEXAS-ARLINGTON,UT Arlington,72.6,Away,ABILENE-CHRISTIAN,Abilene Christian
3,12,0.0,0,114.1,64.5,20,0.477,43,0.395,17,...,29,0.0,0,"Teague Special Events Center, Abilene, Texas",McMurry\n\t\t\t,McMurry\n\t\t\t,78.0,Home,ABILENE-CHRISTIAN,Abilene Christian


Although not technically required, it is best practice to reset the dataframe index after manipulating it. To do so - use the *reset_index* fuction: 
- df.reset_index(drop = True, inplace = True)

Setting 'drop' = True will drop the original index entirely. If it is set to false, then a column named 'index' with the original index will be added to the dataframe. 

In [23]:
df.reset_index(drop = True, inplace = True)
df.head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,11,15.4,6,98.6,51.3,20,0.421,57,0.386,22,...,17,0.0,0,"Jon M. Huntsman Center, Salt Lake City, Utah",ABILENE-CHRISTIAN,Abilene Christian,71.0,Home,UTAH,Utah
1,13,0.0,0,86.6,76.7,23,0.381,67,0.328,22,...,18,0.0,0,"College Park Center, Arlington, Texas",TEXAS-ARLINGTON,UT Arlington,72.6,Away,ABILENE-CHRISTIAN,Abilene Christian
2,12,0.0,0,114.1,64.5,20,0.477,43,0.395,17,...,29,0.0,0,"Teague Special Events Center, Abilene, Texas",McMurry\n\t\t\t,McMurry\n\t\t\t,78.0,Home,ABILENE-CHRISTIAN,Abilene Christian


### Filtering Data
A filter is similar to selecting rows - however, it is based on criteria rather than row indices. For example: 
- Select sales reps stationed in the USA
- Select sales reps that had sales in March > 100 units

Filters can be applied exactly the same way as in excel: 
- equal 
- does not equal
- greater than / greater than or equal to
- less than / less than or equal to
- is in list
- is NOT in list

To apply a single filter: 
- df[df['column_name'] == value]

In addition to adding one filter at a time, filters can apply multiple criteria using AND / OR logic. For example: 
- df[(df['column_name'] == value) & (df['column_name'] > value)]
- df[(df['column_name'] == value) | (df['column_name'] > value)]

To select all values contained in a list: 
- df[df['column_name'].isin(list)]


To select all values NOT contained in a list: 
- df[~df['column_name'].isin(list)]

In [24]:
# Filter df for winning_name = 'Cincinnati'
df[df["winning_name"] == 'Cincinnati'].head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
94,11,6.1,2,118.7,63.3,19,0.395,62,0.371,23,...,18,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",ALABAMA-AM,Alabama A&M,75.4,Home,CINCINNATI,Cincinnati
1039,13,13.3,6,101.4,65.6,21,0.364,59,0.322,19,...,21,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",BRYANT,Bryant,71.9,Home,CINCINNATI,Cincinnati
1784,8,3.1,1,97.0,81.8,27,0.324,54,0.259,14,...,18,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",EVANSVILLE,Evansville,66.9,Home,CINCINNATI,Cincinnati


In [34]:
# Filter df for winning_name != (does not equal)
df[df["winning_name"] != 'Cincinnati'].head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,11,15.4,6,98.6,51.3,20,0.421,57,0.386,22,...,17,0.0,0,"Jon M. Huntsman Center, Salt Lake City, Utah",ABILENE-CHRISTIAN,Abilene Christian,71.0,Home,UTAH,Utah
1,13,0.0,0,86.6,76.7,23,0.381,67,0.328,22,...,18,0.0,0,"College Park Center, Arlington, Texas",TEXAS-ARLINGTON,UT Arlington,72.6,Away,ABILENE-CHRISTIAN,Abilene Christian
2,12,0.0,0,114.1,64.5,20,0.477,43,0.395,17,...,29,0.0,0,"Teague Special Events Center, Abilene, Texas",McMurry\n\t\t\t,McMurry\n\t\t\t,78.0,Home,ABILENE-CHRISTIAN,Abilene Christian


In [27]:
# Filter df for away_assists > 15
print('Greater Than 15: ')
df[df["away_assists"] > 15].head(3)

Greater Than: 


Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
10,23,0.0,0,118.8,69.6,16,0.571,63,0.492,31,...,17,0.0,0,"UCCU Center, Orem, Utah",UTAH-VALLEY,Utah Valley,63.5,Away,ABILENE-CHRISTIAN,Abilene Christian
18,20,6.7,2,103.7,71.4,15,0.477,66,0.424,28,...,21,0.0,0,"UTRGV Fieldhouse, Edinburg, Texas",TEXAS-PAN-AMERICAN,Texas-Rio Grande Valley,82.2,Away,ABILENE-CHRISTIAN,Abilene Christian
22,21,2.9,1,100.0,64.3,18,0.568,59,0.508,30,...,19,0.0,0,"Wisdom Gym, Stephenville, Texas",TARLETON-STATE,Tarleton State,62.6,Away,ABILENE-CHRISTIAN,Abilene Christian


In [28]:
# Filter df for away_assists >= 9
print('Greater Than or Equal to 9: ')
df[df["away_assists"] >= 9].head(3)


Greater Than or Equal to: 


Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,11,15.4,6,98.6,51.3,20,0.421,57,0.386,22,...,17,0.0,0,"Jon M. Huntsman Center, Salt Lake City, Utah",ABILENE-CHRISTIAN,Abilene Christian,71.0,Home,UTAH,Utah
1,13,0.0,0,86.6,76.7,23,0.381,67,0.328,22,...,18,0.0,0,"College Park Center, Arlington, Texas",TEXAS-ARLINGTON,UT Arlington,72.6,Away,ABILENE-CHRISTIAN,Abilene Christian
2,12,0.0,0,114.1,64.5,20,0.477,43,0.395,17,...,29,0.0,0,"Teague Special Events Center, Abilene, Texas",McMurry\n\t\t\t,McMurry\n\t\t\t,78.0,Home,ABILENE-CHRISTIAN,Abilene Christian


In [32]:
# Filter df for away_assists < 5
print('Less than 5: ')
df[df["away_assists"] < 5].head(3)

Less than 5: 


Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
8,3,3.9,2,138.3,56.3,18,0.326,46,0.261,12,...,35,0.0,0,"Teague Special Events Center, Abilene, Texas",Howard Payne\n\t\t\t,Howard Payne\n\t\t\t,80.8,Home,ABILENE-CHRISTIAN,Abilene Christian
24,4,15.8,6,96.3,73.0,27,0.325,40,0.3,12,...,17,0.0,0,"Teague Special Events Center, Abilene, Texas",LAMAR,Lamar,79.7,Home,ABILENE-CHRISTIAN,Abilene Christian
34,1,11.4,4,109.1,78.1,25,0.375,48,0.375,18,...,19,0.0,0,"Ocean Center, Daytona Beach, Florida",HOLY-CROSS,Holy Cross,65.6,Home,AIR-FORCE,Air Force


In [31]:
# Filter df for away_assists < 2
print('Less than or Equal to 2: ')
df[df["away_assists"] <= 2].head(3)

Greater Than or Equal to: 


Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
34,1,11.4,4,109.1,78.1,25,0.375,48,0.375,18,...,19,0.0,0,"Ocean Center, Daytona Beach, Florida",HOLY-CROSS,Holy Cross,65.6,Home,AIR-FORCE,Air Force
1765,2,2.7,1,119.4,55.3,21,0.462,52,0.442,23,...,20,0.0,0,"Grand Canyon University Arena, Phoenix, Arizona",CHICAGO-STATE,Chicago State,67.5,Home,GRAND-CANYON,Grand Canyon
3241,2,2.8,1,104.2,70.4,19,0.391,64,0.344,22,...,25,0.0,0,"EagleBank Arena, Fairfax, Virginia",STONY-BROOK,Stony Brook,71.4,Home,GEORGE-MASON,George Mason


In [36]:
# Define team list
team_list = ['Cincinnati', 'Air Force']

# Filter df for winning_name is in the team_list
df[df['winning_name'].isin(team_list)].head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
32,13,8.1,3,92.1,75.0,24,0.582,49,0.49,24,...,15,0.0,0,"Donald W. Reynolds Center, Tulsa, Oklahoma",TULSA,Tulsa,63.1,Away,AIR-FORCE,Air Force
33,7,5.0,1,95.3,79.3,23,0.429,49,0.408,20,...,13,0.0,0,"Clune Arena , Colorado Springs, Colorado",TEXAS-SOUTHERN,Texas Southern,63.8,Home,AIR-FORCE,Air Force
34,1,11.4,4,109.1,78.1,25,0.375,48,0.375,18,...,19,0.0,0,"Ocean Center, Daytona Beach, Florida",HOLY-CROSS,Holy Cross,65.6,Home,AIR-FORCE,Air Force


In [39]:
# Filter df for winning_name is NOT in the team_list
df[~df['winning_name'].isin(team_list)].head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,11,15.4,6,98.6,51.3,20,0.421,57,0.386,22,...,17,0.0,0,"Jon M. Huntsman Center, Salt Lake City, Utah",ABILENE-CHRISTIAN,Abilene Christian,71.0,Home,UTAH,Utah
1,13,0.0,0,86.6,76.7,23,0.381,67,0.328,22,...,18,0.0,0,"College Park Center, Arlington, Texas",TEXAS-ARLINGTON,UT Arlington,72.6,Away,ABILENE-CHRISTIAN,Abilene Christian
2,12,0.0,0,114.1,64.5,20,0.477,43,0.395,17,...,29,0.0,0,"Teague Special Events Center, Abilene, Texas",McMurry\n\t\t\t,McMurry\n\t\t\t,78.0,Home,ABILENE-CHRISTIAN,Abilene Christian


In [40]:
# Filter for winning_name = 'Cincinnati' AND away_assists > 5
df[(df['winning_name'] == 'Cincinnati') & (df['away_assists'] > 5)].head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
94,11,6.1,2,118.7,63.3,19,0.395,62,0.371,23,...,18,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",ALABAMA-AM,Alabama A&M,75.4,Home,CINCINNATI,Cincinnati
1039,13,13.3,6,101.4,65.6,21,0.364,59,0.322,19,...,21,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",BRYANT,Bryant,71.9,Home,CINCINNATI,Cincinnati
1784,8,3.1,1,97.0,81.8,27,0.324,54,0.259,14,...,18,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",EVANSVILLE,Evansville,66.9,Home,CINCINNATI,Cincinnati


In [41]:
# Filter for winning_name = 'Cincinnati' OR away_assists > 20
df[(df['winning_name'] == 'Cincinnati') | (df['away_assists'] > 20)].head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
10,23,0.0,0,118.8,69.6,16,0.571,63,0.492,31,...,17,0.0,0,"UCCU Center, Orem, Utah",UTAH-VALLEY,Utah Valley,63.5,Away,ABILENE-CHRISTIAN,Abilene Christian
22,21,2.9,1,100.0,64.3,18,0.568,59,0.508,30,...,19,0.0,0,"Wisdom Gym, Stephenville, Texas",TARLETON-STATE,Tarleton State,62.6,Away,ABILENE-CHRISTIAN,Abilene Christian
94,11,6.1,2,118.7,63.3,19,0.395,62,0.371,23,...,18,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",ALABAMA-AM,Alabama A&M,75.4,Home,CINCINNATI,Cincinnati


**Important** - in all of the above examples, we are just returning the values that meet the filter criteria. To select these values into a new dataframe, we need to define the new dataframe: 
- df_2 = df[df['column_name'] == value]

When defining a new datafame based on filtered data, it is best practice to reset the index. 

In [43]:
# Define new dataframe for all records where winning_name = 'Cincinnati'
df_2 = df[df['winning_name'] == 'Cincinnati'].reset_index(drop = True)
df_2.head(3)

Unnamed: 0,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,...,home_two_point_field_goals,home_win_percentage,home_wins,location,losing_abbr,losing_name,pace,winner,winning_abbr,winning_name
0,11,6.1,2,118.7,63.3,19,0.395,62,0.371,23,...,18,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",ALABAMA-AM,Alabama A&M,75.4,Home,CINCINNATI,Cincinnati
1,13,13.3,6,101.4,65.6,21,0.364,59,0.322,19,...,21,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",BRYANT,Bryant,71.9,Home,CINCINNATI,Cincinnati
2,8,3.1,1,97.0,81.8,27,0.324,54,0.259,14,...,18,0.0,0,"Fifth Third Arena, Cincinnati, Ohio",EVANSVILLE,Evansville,66.9,Home,CINCINNATI,Cincinnati
