# How to Create a Dataframe comparing Sports Costs and Revenues
This notebook will walk you through a tutorial on how to gather a subset of data relating to School Sport Equity using Google Colab. We are asking ourselves the question: Does Football or Basketball have produce more revenue when comparing UNC Chapel Hill and Duke University? Also, which sport has more costs at those schools?
## Overview of Tutorial
Be sure that you have access to Google Collab on your computer to get started. When you get on collab, start a new notebook  

Outline of tutorial
1. Mounting Google drive
2. importing the pandas package and numpy
3. creating a dataframe
4. exploring our dataframe's attributes
5. filtering our dataframe
6. exporting our new .csv file

## Mounting Google Drive
The best way to set up persistent access to your data with Colab is to mount your google drive in the notebook, and ensure that the .csv files you're working with are stored there.

We can do this by running the following code:

`from google.colab import drive
drive.mount('/content/gdrive')`


## Importing the Pandas Package and Numpy
### Pandas

Pandas allow us to store our data as dataframes with familiar features like rows, columns, and headers (similar to Microsoft Excel). Pandas will allow you to manage and manipulate large sets of tabular data so it's easier to read for particular pieces of information you're interested in.

### Numpy
Numpy will help us compute math more easily.

### Importing
Begin by importing the pandas package using the following command: Type in your cell import `pandas as pd` and `import numpy as np`. Then, shift return/enter to run the code. Shown below:

`import numpy as np
import pandas as pd`


## Create your dataframe
###### By now, you should have  downloaded the csv file "All_Data_Combined_2022". NOTE: make sure that your csv file is saved in the same working directory as your .ipynb notebook file that you will use.

Remember that Colab Notebooks automatically set your working directory to the folder where the .ipynb is saved.
If a file is located in your working directory, its relative path is just the name of the file!

## How To Code Your Dataframe
`pd.read_csv` reads the tabular data from a Comma Separated Values (csv) file into a dataframe object that we'll define as df.

To create our dataframe object we'll define our object `df` by executing the `pd.read_csv()`function on our data file by inserting the relative file path into the parathenses.

Your code should look like this:

`df = pd.read_csv("All_Data_Combined_2022")`









## Exploring The Attributes of Your Dataframe

A good first step in exploring our dataframe is to examine some of its basic attributes. Attributes contain values that provide helpful information about the dataframe, that guide our interaction with the dataframe. In pandas, we access attributes with the following syntax:

`<DataFrame name>.<attribute name>`

We can use the `.shape` attribute to determine how many rows and columns (in that order) are available. The `.size` attribute gives us the number of cells in the dataframe (rows * columns).

`df.shape`

`(1329, 710)`  With this large dataframe, filtering will be necessary for the specific information we want.



#### Other useful attributes include:

`.columns` provides the column names for the Dataframe

`df.columns`

We can use indexing to sort out certain columns and rows that we want to explore further.

We will run the code `df[columnnames][rows]`

When coding this, we use square brackets. We also will use quotations around the name of columns that we want to explore. When determining how many rows, we will use a colon to give a range.

Example: Let's run the code `df["Male Undergraduates"][0:9]`


This is the output you will recieve.


Name: Male Undergraduates, dtype: float64

#### Let's start creating a dataframe for our question

We can start by finding where Chapel Hill and Duke are located On the CSV file

Duke is located at:

`df[288:289]`

Chapel Hill is located

`df[1120:1121]`

To find Chapel Hill's total basketball and football revenue we could put the code below:

`df[["Basketball Total Revenue","Football Total Revenue"]][288:289]`

So if we want to look at the code we need for the full context of our quesiton, this is what it would look like for UNC. For Duke all you would need to do is change the Rows that Duke is in and copy the rest over.



### Exporting New Dataframe

Before we can export our new dataframe as a .csv file, we have to rename our dataframe.

The variable we will define our dataframe is "UNC". It is also necessary to add .copy() as the notation we will use tends to refer back to the original dataframe. We will add this to the previous line of code we just ran:

`UNC = df[["Basketball Total Revenue","Football Total Revenue","Basketball Total Expenses","Football Total Expenses"]][1120:1121].copy()`

`Duke = df[["Basketball Total Revenue","Football Total Revenue","Basketball Total Expenses","Football Total Expenses"]][288:289].copy()`

## Exporting as a .csv File.

The code that we will use is .to_csv(). We will add the file name that we want and .extension (.csv) in the parentheses.

For our dataframe we will code: `UNC_subset.to_csv("UNC_subset.csv")` and `Duke.to_csv("Duke.csv")` this will export a .csv file in our working directory.

By exporting this into a .csv, it will have the row of indices that pandas created using the .read_csv function from the original dataframe file.

We will put index=False function inside the paraenthis to tell the program to not add the index numbers to our new .csv file.

`UNC.to_csv("UNC.csv", index=False)`

`Duke.to_csv("Duke.csv", index=False)`

## All done!

If you look in your directory, you should see both dataframes.

If it is not in your directory, then make sure all the code that what shown above was inputed properly.

## Our Findings:
First Question: Does Football or Basketball have produce more revenue when comparing UNC Chapel Hill and Duke University?

When looking at our data Subsets, Football produces much more money for both schools when compared to their basketball revenues. UNC Chapel Hill does produce more money for Football with $67,249,397.00


compared to Duke's $60,949,915.00.

Second Question: Which sport has more costs at those schools?


Similar to the Answer above, Football also costs more at both of these schools. UNC Chapel Hill spent $44,472,525.00

when compared to Duke's $39,371,987.00.