<a href="https://colab.research.google.com/github/The-Geology-Guy/ncaa_select_picks/blob/main/predict_picks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 1. March Madness Bracket Picks Generator

Every year, millions of people across the globe tune into the madness of the NCAA Men's Basketball tournament March Madness. This is a tournament involving 64 teams, all with the mission to win every game and be crowned the best in Men's Collegiate Basketball. 

![Show Image](https://upload.wikimedia.org/wikipedia/en/thumb/2/28/March_Madness_logo.svg/220px-March_Madness_logo.svg.png)
###### Image source: https://upload.wikimedia.org/wikipedia/en/thumb/2/28/March_Madness_logo.svg/220px-March_Madness_logo.svg.png

The madness does not just stay on the court, as millions of people each year will fill out brackets in hope that their's will be one of the well known perfect brackets. This perfect bracket is, of course, just improbable, as there are about $1.27\times10^{89}$ different combinations of outcomes for a 64 team bracket. Still, the infintesimal probability does not stop people from filling out multiple brackets, with the hope of having at least the best in their office!

- **Round 1:** 64 teams split into 4 regions with 16 teams in each region (_region names change year to year_)
- **Round 2:** 32 teams remain in total, split between the 4 regions with 8 teams remaining in each region
- **Regional Semifinals (Sweet 16):** 16 teams remain in total, split between the 4 regions with 4 teams remaining in each region
- **Regional Finals (Elite 8):** 8 teams remain in total, split between the 4 regions with 2 teams remaining in each region
- **National Semifinals (Final 4):** the winning team for each region go head-to-head with the opposing region, ex. EAST vs. WEST & SOUTH vs. MIDWEST
- **National Championship:** only 2 teams remain and will battle to be crowned the best basketball team in NCAA Men's Collegiate Basketball

In this notebook, we will explore a way to make predictions based on coin flips. In essence, the model takes the seeding pairs of matchups and pins them against each other in a game involving chance. 

For example, if a 5 seed were to play a 12 seed, then the following would occur:

- 17 total flips of a coin
- If there are more than 5 occurances of 'Heads' in the set of coin flips, then the **Higher** seed (5) wins.
- If there are 13 or more occurances of 'Tails' in the set of coin flips, then the **Lower** seed (12) wins.

This basic scenario will determine each match up. However, there are some other factors to be taken into account when moving forward with the tournament.

To begin this process, we will first need to import the packages we will be using throughout this predictor. The two packages to import are numpy and random.

In [11]:
# Import numpy
import numpy as np

# Import random
import random as rd

### 2. Pick the best team

As mentioned in the introduction, the best team for each matchup is determined by the number of heads in a list of trials. The number of trials is equal to the sum of the two seeds put head to head - _this will become more clear as you progress through the notebook._ 

Using a while loop, the function will run the number of trials equal to the sum of the seeds and store this information in a list named `results`. Whether or not the coin is deemed heads or tails is based on the `.randint()` output. If the coin is equal to 1, then the coin is deemed to have landed on heads, where if the coin equals 2, it is deemed tails. 

Once the list `results` has been completed, the number of heads is counted and compared to the seeds. If the number of heads is greater than the seed number of the higher seed (the minimum of the two seed numbers) then it will return `high_seed`. Where it will return `low_seed` otherwise.

In [12]:
def pick_best(high_seed, low_seed):
    
    '''pick_best takes the high seed and the low seed and determines which 
    seed wins the round. The trial length is determined by the sum of the two seeds.'''
    
    # Flip the coin a number of times equal to the sum of the low and high seeds
    results = []
    trial = 0
    while trial < (low_seed + high_seed):
        coin = rd.randint(1, 2)
        if coin == 1:
            results.append('H')
            trial += 1
        elif coin == 2:
            results.append('T')
            trial += 1
    
    # Determine the winner based on the results list
    if results.count('H') > min(high_seed, low_seed):
        return high_seed
    else:
        return low_seed

### 3. Determine winners of round 1

Round 1 has 32 matchups between 64 teams in only a few days, at this point in the month, it is truely madness. For those spectators not on the court, but trying to win their company or school bracket competition, this is where the vast majority of brackets lose any chance at perfection.

The first part of this function adds the high and low seeds to two different variables `high_points` and `low_points`. Just for some clairification, the high seeds are ordered from low to high, or 1 through 8, where the low seeds are ordered in reverse order, or 16 through 9. This is so that when they are zipped together that the matchups are properly aligned in the resulting tuple - which should look like the following: $(1, 16), (2, 15), ... , (6, 9), (7, 8)$

Next, the function loops through the tuples and applies the `pick_best` function. 

Lastly, in order to stay in line with how the bracket is constructed, _reference this to see how the winners proceed_, we cut the list in half and return a tuple containing the matchups for the next round.

In [13]:
def round_1():
    
    '''round_1 is a function that initiates the seeding pairs by implementing a 
    range of numbers between 1 and 16. Then, it zips together the high and low
    seeds together into a tuple. Lastly, this runs the pick_best function and 
    returns a list of the winners of the round.'''
    
    #Initiate the seed pairings for the conference (1 through 16)
    high_points = np.arange(1, 9)
    low_points = np.arange(16, 8, -1)
    
    # Zip together the high and low points and save to the variable matchups as a tuple
    matchups = tuple(zip(high_points, low_points))
    
    # Pick the winners of each matchup by looping through the matchups with the pick_best function
    results_rd = []
    i = 0
    while i < len(matchups):
        winner = pick_best(matchups[i][0], matchups[i][1])
        results_rd.append(winner)
        i += 1
        
    # Determine the halfway point of the list and only return the winning values
    halfway = int(len(results_rd) / 2)
    round_1_winners = tuple(zip(results_rd[:halfway], reversed(results_rd[halfway:])))
    return round_1_winners

### 4. Determine winners of round 2

Much like the round 1 function, the round 2 function is named `round_2`. In this round, the function will be determining the 16 teams, _before this function, 8 teams remain for each region, after this is run, 4 teams will move on, where 4 teams by 4 regions is equal to 16 teams remaining._

The big difference between the functions `round_1` and `round_2` is that the function does not need to initiate the list of teams for each region and their respective matchups. Using this known difference, the task for `round_2` will be to recreate `round_1`, except this time the function will be reading in the tupled list `match_list`.

In [14]:
def round_2(match_list):
    
    '''This runs the pick_best function for the matchups determined
    from the previous round (round_1) and returns a list of the 
    winners of the round.'''
    
    # Pick the winners of each matchup by looping through the matchups with the pick_best function
    matchups = match_list
    results_rd = []
    i = 0
    while i < len(matchups):
        winner = pick_best(matchups[i][0], matchups[i][1])
        results_rd.append(winner)
        i += 1
        
    # Determine the halfway point of the list and only return the winning values
    halfway = int(len(results_rd) / 2)
    round_2_winners = tuple(zip(results_rd[:halfway], reversed(results_rd[halfway:])))
    return round_2_winners

### 5. Determine winners of the Sweet 16

Unlike the previous round, the Sweet 16 round returns a list of results, rather than the tupled list. This is because there are only 4 teams left in each region, meaning that only 2 teams will be advancing to the regional finals or `Elite 8` and the order of these two teams does not matter.

In [15]:
def sweet_16(match_list):
    
    '''Tthis runs the pick_best function for the matchups determined
    from the previous round (round_2) and returns a list of the 
    winners of the round.'''
    
    # Pick the winners of each matchup by looping through the matchups with the pick_best function
    matchups = match_list
    results_rd = []
    i = 0
    while i < len(matchups):
        winner = pick_best(matchups[i][0], matchups[i][1])
        results_rd.append(winner)
        i += 1
    
    # This puts the results into a list called sweet_16_winners
    sweet_16_winners = list(results_rd)
    return sweet_16_winners

### 6. Determine winners of the Elite 8

The Elite 8 is made up of all the regional finalists. The finalists from the 4 regions go head-to-head, where the victor of the region will move on to the `Final Four`.

This function is much more simple than the others, all that is needed is to import the list from the `sweet_16` function and pick the best of the two teams remaining using the `pick_best` function.

In [16]:
def elite_8(match_list):
    
    '''This function applies the pick_best function to the
    winners of the sweet 16 round.'''
    
    matchups = match_list
    
    # Applt the function to the matchup in the elite 8
    elite_winner = pick_best(matchups[0], matchups[1])
    return elite_winner

### 7. Pick the winners of the Final Four

All of the regions have now found their respective representatives to participate in the semi-finals. There are 4 teams remaining, but now the numbers are not unique. For instance, if both the regional winners are seeded at `2`, how do we pick the best one?

We will need to re-purpose the `pick_best` function into a function that will be pick the two teams that will be moving forward to the finals. This time, we will be including the regions in the information with the teams.  This function is called `pick_final`.

The other questions surrounding the `pick_final` is: what if there is a tie or both teams are `1` seeds? In these situations, the teams will both have a 50/50 chance of moving forward, where the team that is first in the list will win with heads, and the second team with tails.

In [17]:
def pick_final(high_seed, low_seed, region_name):
        
    '''pick_final takes the high seed and the low seed and determines which 
    seed wins the round. The trial length is determined by the sum of the two seeds.
    The region may be South, East, Midwest, or West.'''
    
    # Flip the coin a number of times equal to the sum of the low and high seeds
    results = []
    trial = 0
    while trial < (low_seed + high_seed):
        coin = rd.randint(1, 2)
        if coin == 1:
            results.append('H')
            trial += 1
        elif coin == 2:
            results.append('T')
            trial += 1
    
    # Determine the winner based on the results list, with the winner's region included
    if results.count('H') > min(high_seed, low_seed):
        return region_name[0], high_seed
    else:
        return region_name[1], low_seed

### 8. Determine which seeds go to the championship

The top four teams are all that remain, each with a shot at making it to the championship round. Four teams stand, but in this round, two teams will fall, and two teams will be headed to the most prized game in NCAA basketball.

In order to get the two winners, the `pick_final` function will be used. You will notice that the `i` and `j` variables are adding 2 to the total at the end of the while statement - this is so that we can pair up the `0` and `1` positions in the regions list and the `2` and `3` positions in the list - which plays out to be `East` vs. `West` and `South` vs. `Midwest` in the 2019 bracket. 

> List of regions: ['EAST', 'WEST', 'SOUTH', 'MIDWEST']

In [18]:
def final_4(match_list, region_name):
    
    '''This runs the pick_final function for the matchups determined
    from the previous round (elite_8) and returns a list of the 
    winners of the round. The regions are now being used in this function
    to keep the regions and seedings together.'''
    
    matchups = tuple(zip(region_name, match_list))
    results_rd = []
    i = 0
    j = 1
    
    # The regions list ['EAST', 'WEST', 'SOUTH', 'MIDWEST']
    while i < len(matchups):
        winner = pick_final(matchups[i][1], matchups[j][1], list((matchups[i][0], matchups[j][0])))
        results_rd.append(winner)
        i += 2
        j += 2
    
    # List out the winners of the final four
    final_4_winners = list(results_rd)
    return final_4_winners

### 9. Combine and return all round results

Before we define the function to pick the brackets, from `IPython.display`, make sure to import both `display` and `Markdown` - _these will make the output of text easier to read._

Now it is time to use all of the functions defined previously - `round_1`, `round_2`, `sweet_16`, `elite_8`, `final_4`, and `pick_final`. This aggregation of all the previous functions will be called `pick_brackets` and will take `region_names` as an argument. `region_names` will be a list of all the regions in the bracket. Recall that the region names change depending on the year, we will cover how to order them in the next section.

It is important to generate the round up until and including the winners of the regional finals for each individual region. To do this, we will loop through the regions and generate the winners using the functions for each of the first four rounds: `round_1`, `round_2`, `sweet_16`, and `elite_8`. After this section, we do not need to run through each region. The challenge for the last two round functions - `final_4`, and `pick_final` - is that we do have region name and seed number to keep track of.

Next, the National Semifinalists will be chosen using the `final_4` function. Remember that `final_4` returns a list, so we have to make sure to return the items row-wise. For example, `final_4_winners[0][0]` and `final_4_winners[0][1]` would return the region with the correctly associated seeding.

Finally, we crown a champion using the `pick_final` function. 

In [19]:
from IPython.display import display, Markdown

def pick_brackets(region_names):
    
    '''Comment '''
    
    # The name may change depending on year, the order of the list 'name' is important
    region = []
    name = region_names
    round_ = 0
    final_four = []
    
    # Run through each of the regions and display the results
    while round_ < len(name):
        
        display(Markdown("### " + name[round_]))
        
        rd_1_group = round_1()
        display(Markdown("First Round: " + str(rd_1_group)))
        
        rd_2_group = round_2(rd_1_group)
        display(Markdown("Second Round: " + str(rd_2_group)))
        
        rd_3_group = sweet_16(rd_2_group)
        display(Markdown("Regional Semifinals: " + str(rd_3_group)))
        
        conference_winner = elite_8(rd_3_group)
        display(Markdown("Regional Finals:   " + str(conference_winner)))
        
        final_four.append(conference_winner)
        round_ += 1
        
    # Display the Winners of the National Semifinals - also known as the Final Four
    display(Markdown("### National Semifinals"))
    final_4_winners = final_4(final_four, name)
    display(Markdown(str(final_4_winners[0][0]) + ": " + str(final_4_winners[0][1])))
    display(Markdown(str(final_4_winners[1][0]) + ": " + str(final_4_winners[1][1])))
    
    # Crown the champion and display both the region and seed of the winner
    champion = pick_final(final_4_winners[0][1], final_4_winners[1][1], list((final_4_winners[0][0], final_4_winners[1][0])))
    display(Markdown("### Champion: " + str(champion[0]) + ", Seed " + str(champion[1])))

### 10. Make your picks

<img src="https://www.ncaa.com/sites/default/files/public/styles/original/public-s3/images/2019/04/09/ncaa-tournament-bracket-2019-scores-games-virginia-texas-tech.png?itok=0E3VNWmI" alt="March Madness Bracket 2019" style="width:900px;"/>

###### Image Source: https://www.ncaa.com/news/basketball-men/ncaa-bracket-march-madness

Everything is now in place - _except for the region names_. Referencing the region above, the list of regions must follow a specific order for this to work. The order in the list must look like the following:

[`Upper Left`, `Lower Left`, `Upper Right`, `Lower Right`]

This sequence, _referencing the 2019 March Madness Bracket above_, would have the order of `EAST`, `WEST`, `SOUTH`, then `MIDWEST`. Once this step is complete, you may run the `pick_brackets`.

After the results are produced, you can fill in a bracked based on the results. For each of the regions, you will have 4 rounds of results to enter. For instance, if the second round results in the East are `((1,4),(7,3))` then the next round in the East Region will be seeds 1 vs. 4 and seeds 7 vs. 3.

In [20]:
# Enter in the region names and then run the pick_brackets function
region_names = ['EAST', 'WEST', 'SOUTH', 'MIDWEST']
pick_brackets(region_names)

### EAST

First Round: ((1, 8), (2, 10), (3, 6), (4, 5))

Second Round: ((1, 4), (2, 6))

Regional Semifinals: [1, 6]

Regional Finals:   1

### WEST

First Round: ((1, 8), (2, 10), (3, 6), (4, 5))

Second Round: ((1, 4), (2, 6))

Regional Semifinals: [1, 2]

Regional Finals:   2

### SOUTH

First Round: ((1, 8), (2, 7), (3, 6), (4, 5))

Second Round: ((1, 4), (2, 3))

Regional Semifinals: [1, 3]

Regional Finals:   1

### MIDWEST

First Round: ((1, 9), (2, 7), (3, 6), (4, 5))

Second Round: ((1, 5), (2, 6))

Regional Semifinals: [1, 2]

Regional Finals:   1

### National Semifinals

EAST: 1

MIDWEST: 1

### Champion: EAST, Seed 1