# IS 362 Alterantive Project - Chess Tournament

In this project we are asked to process data from a chess tournament.  The data comes in the form of a crosstable or sometimes referred to as a pivot table.  The table can be found here [Chess Tournament Data](https://github.com/meheino77/Alternative_Project/blob/master/tournamentinfo.txt).  The requirements of the assignment are the following:

1. Create a Jupyter Notebook that genereates a CSV that contains the following data:  Player's Name, Player's State, Total Number of Points, Player's Pre-Rating, and Average Pre Tournament Chess Rating.
2. Calculate the average opponent rating for each opponent faced for each of the players in the table

## Reading in of the table data

In order to process the table we need to notice that the table's data for the player is spread over two rows.  Also, we need to realize that the rows are separated by "-----------".  We will need to skip over these as well as the first four rows that compose the table header.  The code below shows how this task was accomplished.

In [41]:
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

player_name = []
player_rate = []

# create an empty data frame with player data
column_names1 = ['Player Number','Player Name', 'Total Points','Round 1', 
                'Round 2', 'Round 3', 'Round 4','Round 5','Round 6','Round 7' ]

column_names2 = ['Player State', 'Pre Rating','Post Rating'] 

player_data1 = pd.DataFrame(columns=column_names1) 
player_data2 = pd.DataFrame(columns=column_names2)

with open('tournamentinfo.txt') as chess:
    
    line_count = 1
    for _ in range(4):
        next(chess)
    for chess_line in chess:
        if chess_line.startswith("-------------"):
            line_count = 1
        elif line_count == 1:
            #add string to player name
            player_name.append(chess_line)
            line_count = 2
        elif line_count == 2:
            player_rate.append(chess_line)
            line_count == 1
        else:
            print("ERROR!!!!!")

chess.close()

This stores the lines that are read in into a variable called ```player_name```.  This will hold first line of data - players information and opponents faced.  The ```player_rate``` will hold the rating for the player.  
## Processing the data in the lines
Next we need to process these lines to extract the information from the formatting and other elements that are found in the line.  The first we will process is the player information that is found in the ```player_name```.  If we look at the line structure we can see that the information is found at the same indexes across the row. So we can make use of a slice to retrieve the needed information.  This approach will also be applied to the ```player_rate```. The code to process both the ```player_name``` and ```player_rate``` is shown in the cell below.

In [42]:
#print the list
for pn in player_name:
    
    player_num = pn[1:6].strip()
    player_name = pn[8:40].strip()
    total_points = float(pn[41:44])
    round1 = pn[48:52].strip()
    round2 = pn[54:58].strip()
    round3 = pn[60:64].strip()
    round4 = pn[66:70].strip()
    round5 = pn[72:76].strip()
    round6 = pn[78:82].strip()
    round7 = pn[84:88].strip()
    
    # Add to the frame.
    player_data1 = player_data1.append({'Player Number': player_num,
                                            'Player Name': player_name, 
                                            'Total Points':total_points,
                                            'Round 1': round1,'Round 2': round2,
                                            'Round 3': round3,'Round 4': round4,
                                            'Round 5': round5,'Round 6': round6,
                                            'Round 7': round7}, ignore_index=True)
   
# process the player state and other needed da
for pr in player_rate:
    
    player_state = pr[1:6]#.strip()
    player_prev = pr[22:26]
    player_post = pr[31:35]
    
    player_data2 = player_data2.append({'Player State':player_state,
                                        'Pre Rating': player_prev,
                                        'Post Rating': player_post}, ignore_index=True)

After processing the lines we now have two frames with the data that was extracted.  We need to merge the data into one coherent frame to be processed later.  The code below merges the two frames and output the first 5 rows to confirm that the tables have been successfully merged.

In [43]:
player_data_all = pd.merge(player_data1, player_data2, right_index=True, left_index=True)

print(player_data_all.head(3))

  Player Number      Player Name  Total Points Round 1 Round 2 Round 3 Round 4 Round 5 Round 6 Round 7 Player State Pre Rating Post Rating
0             1         GARY HUA           6.0      39      21      18      14       7      12       4          ON        1794        1817
1             2  DAKSHESH DARURI           6.0      63      58       4      17      16      20       7          MI        1553        1663
2             3     ADITYA BAJAJ           6.0       8      61      25      21      11      13      12          MI        1384        1640


In order to facilitate the processing of the player data.  I will extract the player numbers to be able to loop though the data using a **for** loop.  To extract the player number the code is shown below.  The result was stored in a list to allow it to be iterated through.

In [44]:
# Extract the player numbers.
player_num_lst = player_data_all['Player Number'].tolist()

## Processing the opponents rating and getting the average

In this section we will extract the opponents rating and assign them to a variable and then to a list.  This list ```rating_list``` hold the ratings.  Code to extract the opponent rating is shown below.

In [45]:
rating_list = []
opponent_avg = []

for num in player_num_lst:
    
    #retrieve the opponets for each player
    opponents = player_data_all[['Round 1','Round 2', 'Round 3','Round 4','Round 5','Round 6','Round 7']].loc[ player_data_all['Player Number'] == num]
    
    rat1 =  opponents['Round 1'].values[0]
    rat2 =  opponents['Round 2'].values[0]
    rat3 =  opponents['Round 3'].values[0]
    rat4 =  opponents['Round 4'].values[0]
    rat5 =  opponents['Round 5'].values[0]
    rat6 =  opponents['Round 6'].values[0]
    rat7 =  opponents['Round 7'].values[0]
    
    if rat1 != "":
        rating1 = (player_data_all[['Pre Rating']].loc[ player_data_all['Player Number'] 
        == rat1]).values[0]
        rating_list.append(int(rating1))
    
    if rat2 != "":
        rating2 = (player_data_all[['Pre Rating']].loc[ player_data_all['Player Number'] 
        == rat2]).values[0]
        rating_list.append(int(rating2))
    
    if rat3 != "":
        rating3 = (player_data_all[['Pre Rating']].loc[ player_data_all['Player Number'] 
        == rat3]).values[0]
        rating_list.append(int(rating3))
    
    if rat4 != "":
        rating4 = (player_data_all[['Pre Rating']].loc[ player_data_all['Player Number'] 
        == rat4]).values[0]
        rating_list.append(int(rating4))

    if rat5 != "":
        rating5 = (player_data_all[['Pre Rating']].loc[ player_data_all['Player Number'] 
        == rat5]).values[0]
        rating_list.append(int(rating5))
        
    if rat6 != "":
        rating6 = (player_data_all[['Pre Rating']].loc[ player_data_all['Player Number'] 
        == rat6]).values[0]
        rating_list.append(int(rating6))
        
    if rat7 != "":
        rating7 = (player_data_all[['Pre Rating']].loc[ player_data_all['Player Number'] 
        == rat7]).values[0]
        rating_list.append(int(rating7))
    
    #Calculate the average of the opponents
    average = sum(rating_list) / len(rating_list)
    opponent_avg.append(int(average))
    
    
    #clear the list for next interation
    rating_list.clear()

After each iteration is completed the average is calcuated by the following:

``` average = sum(rating_list) / len(rating_list)
   opponent_avg.append(int(average)) ```

This is result is stored in ```average``` then.  It will be added to the frame using the following code:

In [46]:
#add to the data frame
player_data_all['Opponent Average'] = opponent_avg

print (player_data_all.head())

  Player Number          Player Name  Total Points Round 1 Round 2 Round 3 Round 4 Round 5 Round 6 Round 7 Player State Pre Rating Post Rating  Opponent Average
0             1             GARY HUA           6.0      39      21      18      14       7      12       4          ON        1794        1817              1605
1             2      DAKSHESH DARURI           6.0      63      58       4      17      16      20       7          MI        1553        1663              1469
2             3         ADITYA BAJAJ           6.0       8      61      25      21      11      13      12          MI        1384        1640              1563
3             4  PATRICK H SCHILLING           5.5      23      28       2      26       5      19       1          MI        1716        1744              1573
4             5           HANSHI ZUO           5.5      45      37      12      13       4      14      17          MI        1655        1690              1500


## Cleaning up the frame

We need to drop some rows from ```player_data_all``` to meet the assignment specification. So we will drop the rows that are not needed.  These rows are: Round 1, Round 2, Round 3, Round 4, Round 5, Round 6, Round 7, Post Rating. I will also re-order the columns to match the output that is required in the assignment.

In [47]:
#Drop the unneeded columns from the frame.
player_data_all.drop(['Round 1', 'Round 2', 'Round 3', 'Round 4','Round 5','Round 6','Round 7', 'Post Rating'],
                     axis=1, inplace=True)

#re-order the columns
reorder_col = ["Player Number", "Player Name", 'Player State', "Total Points", "Prev Rating", "Opponent Average" ]
player_data_all.reindex(columns=reorder_col)

print (player_data_all.head())

  Player Number          Player Name  Total Points Player State Pre Rating  Opponent Average
0             1             GARY HUA           6.0          ON        1794              1605
1             2      DAKSHESH DARURI           6.0          MI        1553              1469
2             3         ADITYA BAJAJ           6.0          MI        1384              1563
3             4  PATRICK H SCHILLING           5.5          MI        1716              1573
4             5           HANSHI ZUO           5.5          MI        1655              1500


## Outputting to a file

The final requirement of the assignment was to output the results to a file.  The code below will create a file called "chess.csv" 

In [48]:
#write data to a csv file
player_data_all.to_csv("chess.csv", index=False)

The output file for the this requirement can be found here [chess.csv](https://github.com/meheino77/Alternative_Project/blob/master/chess.csv) THis final step should complete the requirements for this assignment.  