# Project 2
For project 2, we built an ETL pipeline to create a database containing data on Phish live shows from 1993-2023. First, we scrapped the website phish.net for setlists and the Wikipedia page for Phish concert tours and festivals for attendance and box office data. We then transformed the extracted data, reformatting columns, value formatting, and dropping rows that lacked useful data. We stored all of the results for each year in a variable, which were in turn each stored in a list. To load our data, we converted each variable to a DataFrame and then wrote the DataFrames to CSV files. Finally, we loaded the data from the CSV files directly to SQLite and PostgreSQL. There are many interesting questions that can be explored when analyzing the database we have prepared. For example, we could look at how City and Year affect Attendance and Attendance/Capacity. We could count the recurrence of previous Cities and weigh Attendance and Gross to create a predictive model to determine the likeliest cities to be announced for future show dates. 

In [1]:
#from splinter import Browser
#from bs4 import BeautifulSoup as soup
from datetime import datetime
import pandas as pd
import requests
from bs4 import BeautifulSoup
import os
import shutil
from pathlib import Path
import sqlite3
from sqlalchemy import create_engine
from sqlalchemy.types import Integer, Text, String, DateTime, Float
import psycopg2

## Part 1: Extract
#### Scrapping for Data

In [2]:
#browser = Browser('chrome')

In [3]:
# Pull in setlist data

url = 'https://phish.net/setlists/phish/'
#browser.visit(url)
#html = browser.html
#phish_soup = soup(html, 'html.parser')
respons = requests.get(url)

phish_soup = BeautifulSoup(respons.text, 'html.parser')

In [4]:
# Pull in concert attendance and box office data

url2 = 'https://en.wikipedia.org/wiki/Phish_concert_tours_and_festivals#Box_office_score_data'
#browser.visit(url2)
#html2 = browser.html
#tour_soup = soup(html2, 'html.parser')

respons = requests.get(url)

tour_soup = BeautifulSoup(respons.text, 'html.parser')

In [5]:
# pd.read_html for the win!

table_dfs = pd.read_html('https://en.wikipedia.org/wiki/Phish_concert_tours_and_festivals#Box_office_score_data')
money_dfs = []

for table_df in table_dfs:
    if table_df.columns[-1] == 'Gross':
        money_dfs.append(table_df)
money_dfs

[   Date (1993)                      City                 Venue  \
 0       May 29    Salinas, United States   Laguna Seca Raceway   
 1       May 30    Salinas, United States   Laguna Seca Raceway   
 2  December 31  Worcester, United States  Centrum in Worcester   
 3        TOTAL                     TOTAL                 TOTAL   
 
                Attendance     Gross  
 0         20,000 / 20,000  $504,082  
 1         20,000 / 20,000  $504,082  
 2         14,581 / 14,581  $320,220  
 3  34,581 / 34,581 (100%)  $824,302  ,
    Date (1994)                          City  \
 0  December 28   Philadelphia, United States   
 1  December 29     Providence, United States   
 2  December 30  New York City, United States   
 3  December 31         Boston, United States   
 4        TOTAL                         TOTAL   
 
                                            Venue              Attendance  \
 0  Philadelphia Convention Hall and Civic Center         10,325 / 10,325   
 1               

In [6]:
money_dfs[4]

Unnamed: 0,Date (1998),City,Venue,Attendance,Gross
0,July 21,"Virginia Beach, United States",GTE Virginia Beach Amphitheatre,"20,074 / 20,074","$486,775"
1,August 2,"George, United States",The Gorge Amphitheatre,"37,871 / 40,000","$1,023,129"
2,August 3,"George, United States",The Gorge Amphitheatre,"37,871 / 40,000","$1,023,129"
3,August 9,"East Troy, United States",Alpine Valley Music Theatre,"34,642 / 34,642","$866,202"
4,August 10,"Noblesville, United States",Deer Creek Music Center,"41,782 / 41,782","$1,044,762"
5,August 11,"Noblesville, United States",Deer Creek Music Center,"41,782 / 41,782","$1,044,762"
6,August 16,"Limestone, United States",Loring Air Force Base,"123,176 / 123,176","$4,337,184"
7,August 17,"Limestone, United States",Loring Air Force Base,"123,176 / 123,176","$4,337,184"
8,December 29,"New York City, United States",Madison Square Garden,"56,704 / 56,704","$1,583,886"
9,December 30,"New York City, United States",Madison Square Garden,"56,704 / 56,704","$1,583,886"


Now we begin scraping the setlist data!

In [7]:
dates = phish_soup.find_all('span', class_='setlist-date')
dates

date_strings = [date.text[-11:] for date in dates]

cleaned_date_strings = [date.strip() for date in date_strings]
cleaned_date_strings[0:5]

['09/03/2023', '09/02/2023', '09/01/2023', '08/31/2023', '08/26/2023']

In [8]:
dates

[<span class="setlist-date">
 <a href="/setlists/phish">PHISH</a>,
 	<a href="https://phish.net/setlists/phish-september-03-2023-dicks-sporting-goods-park-commerce-city-co-usa.html">SUNDAY 09/03/2023</a>
 </span>,
 <span class="setlist-date">
 <a href="/setlists/phish">PHISH</a>,
 	<a href="https://phish.net/setlists/phish-september-02-2023-dicks-sporting-goods-park-commerce-city-co-usa.html">SATURDAY 09/02/2023</a>
 </span>,
 <span class="setlist-date">
 <a href="/setlists/phish">PHISH</a>,
 	<a href="https://phish.net/setlists/phish-september-01-2023-dicks-sporting-goods-park-commerce-city-co-usa.html">FRIDAY 09/01/2023</a>
 </span>,
 <span class="setlist-date">
 <a href="/setlists/phish">PHISH</a>,
 	<a href="https://phish.net/setlists/phish-august-31-2023-dicks-sporting-goods-park-commerce-city-co-usa.html">THURSDAY 08/31/2023</a>
 </span>,
 <span class="setlist-date">
 <a href="/setlists/phish">PHISH</a>,
 	<a href="https://phish.net/setlists/phish-august-26-2023-broadview-stage-a

In [9]:
venues = phish_soup.find_all('div', class_='setlist-venue')
venues

venue_strings = [venue.text.strip() for venue in venues]
venue_strings[0:5]

["DICK'S SPORTING GOODS PARK",
 "DICK'S SPORTING GOODS PARK",
 "DICK'S SPORTING GOODS PARK",
 "DICK'S SPORTING GOODS PARK",
 'BROADVIEW STAGE AT SPAC']

In [10]:
locations = phish_soup.find_all('div', class_='setlist-location')
locations

locations = [location.text.strip() for location in locations]

cities = [location.split(',')[0] for location in locations]
states = [location[-2:] for location in locations]

print(cities[0:5])
print(states[0:5])
    

['Commerce City', 'Commerce City', 'Commerce City', 'Commerce City', 'Saratoga Springs']
['CO', 'CO', 'CO', 'CO', 'NY']


In [11]:
set_list_notes = phish_soup.find_all('div', class_='setlist-notes')
set_list_notes

[<div class="setlist-notes"><br/>
 	This show started late and consisted of only one set due to a lengthy weather delay.</div>,
 <div class="setlist-notes"><br/>
 	Reba did not contain the whistling ending. Chalk Dust was unfinished. Mike teased Merrily We Roll Along after the soundcheck's Fast Enough for You.</div>,
 <div class="setlist-notes"><br/>
 	Page teased In the Mood after the soundcheck's second Ether Edge.</div>,
 <div class="setlist-notes"><br/>
 	Trey dedicated Carini to Frenchie (Tim Gazaille), a fan who had passed away earlier in the year and was the "naked guy" during the rain delay at Dick's the previous summer. Trey teased San-Ho-Zay in Halley's Comet. Blaze On's lyrics were changed to "we'll be dancing here at Dick's." We Are Come To Outlive Our Brains was performed for the first time since August 6, 2021 (109 shows).</div>,
 <div class="setlist-notes"><br/>
 	This performance was a benefit for Vermont and Upstate New York flood recovery efforts. Down with Disease wa

In [12]:
phish_p1_df = pd.DataFrame({
    'Date': cleaned_date_strings,
    'Venue': venue_strings,
    'City': cities,
    'State': states
})
phish_p1_df[-5:]

Unnamed: 0,Date,Venue,City,State
93,10/20/2021,MATTHEW KNIGHT ARENA,Eugene,OR
94,10/19/2021,MATTHEW KNIGHT ARENA,Eugene,OR
95,10/17/2021,CHASE CENTER,San Francisco,CA
96,10/16/2021,CHASE CENTER,San Francisco,CA
97,10/15/2021,GOLDEN 1 CENTER,Sacramento,CA


## Part 2: Transform
#### Cleaning and Formatting Data

In [13]:
money_dfs_1993 = money_dfs[0]
money_dfs_1993['Gross'] = money_dfs_1993['Gross'].str.replace('$', '')
money_dfs_1993_split = money_dfs_1993['Attendance'].str.split('/', expand=True)

money_dfs_1993['Attendance'] = money_dfs_1993_split[0]
money_dfs_1993['Capacity'] = money_dfs_1993_split[1]

desired_column_order = ['Date (1993)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']

money_dfs_1993_reordered = money_dfs_1993[desired_column_order]


money_dfs_1993 = money_dfs_1993_reordered

money_dfs_1993['City'] = money_dfs_1993['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=1993)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_1993['Date (1993)'] = money_dfs_1993['Date (1993)'].apply(convert_date)

money_dfs_1993 = money_dfs_1993.drop(3)
money_dfs_1993 = money_dfs_1993.reset_index(drop=True)

money_dfs_1993 = money_dfs_1993.rename(columns={'Date (1993)': 'Date'})


money_dfs[0] = money_dfs_1993

money_dfs[0]



  money_dfs_1993['Gross'] = money_dfs_1993['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,05/29/1993,Salinas,Laguna Seca Raceway,20000,20000,504082
1,05/30/1993,Salinas,Laguna Seca Raceway,20000,20000,504082
2,12/31/1993,Worcester,Centrum in Worcester,14581,14581,320220


In [14]:
money_dfs_1994 = money_dfs[1]
money_dfs_1994['Gross'] = money_dfs_1994['Gross'].str.replace('$', '')

money_dfs_1994_split = money_dfs_1994['Attendance'].str.split('/', expand=True)
money_dfs_1994['Attendance'] = money_dfs_1994_split[0]
money_dfs_1994['Capacity'] = money_dfs_1994_split[1]

desired_column_order = ['Date (1994)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_1994_reordered = money_dfs_1994[desired_column_order]
money_dfs_1994 = money_dfs_1994_reordered

money_dfs_1994['City'] = money_dfs_1994['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=1994)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_1994['Date (1994)'] = money_dfs_1994['Date (1994)'].apply(convert_date)

money_dfs_1994 = money_dfs_1994.drop(4)
money_dfs_1994 = money_dfs_1994.reset_index(drop=True)

money_dfs_1994 = money_dfs_1994.rename(columns={'Date (1994)': 'Date'})


money_dfs[1] = money_dfs_1994

money_dfs[1]


  money_dfs_1994['Gross'] = money_dfs_1994['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,12/28/1994,Philadelphia,Philadelphia Convention Hall and Civic Center,10325,10325,201338
1,12/29/1994,Providence,Providence Civic Center,14174,14174,272532
2,12/30/1994,New York City,Madison Square Garden,18977,18977,426978
3,12/31/1994,Boston,Boston Garden,15135,15135,355673


In [15]:
money_dfs_1995 = money_dfs[2]
money_dfs_1995['Gross'] = money_dfs_1995['Gross'].str.replace('$', '')

money_dfs_1995_split = money_dfs_1995['Attendance'].str.split('/', expand=True)
money_dfs_1995['Attendance'] = money_dfs_1995_split[0]
money_dfs_1995['Capacity'] = money_dfs_1995_split[1]

desired_column_order = ['Date (1995)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_1995_reordered = money_dfs_1995[desired_column_order]
money_dfs_1995 = money_dfs_1995_reordered

money_dfs_1995['City'] = money_dfs_1995['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=1995)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_1995['Date (1995)'] = money_dfs_1995['Date (1995)'].apply(convert_date)

money_dfs_1995 = money_dfs_1995.drop(16)
money_dfs_1995 = money_dfs_1995.reset_index(drop=True)

money_dfs_1995 = money_dfs_1995.rename(columns={'Date (1995)': 'Date'})


money_dfs[2] = money_dfs_1995

money_dfs[2]

  money_dfs_1995['Gross'] = money_dfs_1995['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,06/23/1995,Stanhope,Waterloo Village,16643,16643,377400
1,06/28/1995,Wantagh,Jones Beach Amphitheater,22110,22110,487475
2,06/29/1995,Wantagh,Jones Beach Amphitheater,22110,22110,487475
3,10/31/1995,Rosemont,Rosemont Horizon,18311,18311,411998
4,11/09/1995,Atlanta,Fox Theatre,13547,13547,304808
5,11/10/1995,Atlanta,Fox Theatre,13547,13547,304808
6,11/11/1995,Atlanta,Fox Theatre,13547,13547,304808
7,11/24/1995,Pittsburgh,Civic Arena,10669,18742,213380
8,11/25/1995,Hampton,Hampton Coliseum,12903,12903,260976
9,12/04/1995,Amherst,William D. Mullins Memorial Center,21018,21018,420360


In [16]:
money_dfs_1996 = money_dfs[3]
money_dfs_1996['Gross'] = money_dfs_1996['Gross'].str.replace('$', '')

money_dfs_1996_split = money_dfs_1996['Attendance'].str.split('/', expand=True)
money_dfs_1996['Attendance'] = money_dfs_1996_split[0]
money_dfs_1996['Capacity'] = money_dfs_1996_split[1]

desired_column_order = ['Date (1996)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_1996_reordered = money_dfs_1996[desired_column_order]
money_dfs_1996 = money_dfs_1996_reordered

money_dfs_1996['City'] = money_dfs_1996['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=1996)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_1996['Date (1996)'] = money_dfs_1996['Date (1996)'].apply(convert_date)

money_dfs_1996 = money_dfs_1996.drop(18)
money_dfs_1996 = money_dfs_1996.reset_index(drop=True)

money_dfs_1996 = money_dfs_1996.rename(columns={'Date (1996)': 'Date'})


money_dfs[3] = money_dfs_1996

money_dfs[3]

  money_dfs_1996['Gross'] = money_dfs_1996['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,08/04/1996,Morrison,Red Rocks Amphitheatre,36962,36962,924050
1,08/05/1996,Morrison,Red Rocks Amphitheatre,36962,36962,924050
2,08/06/1996,Morrison,Red Rocks Amphitheatre,36962,36962,924050
3,08/07/1996,Morrison,Red Rocks Amphitheatre,36962,36962,924050
4,08/12/1996,Noblesville,Deer Creek Music Center,42158,42158,851865
5,08/13/1996,Noblesville,Deer Creek Music Center,42158,42158,851865
6,08/14/1996,Hershey,Hersheypark Stadium,25100,25100,619100
7,08/16/1996,Plattsburgh,Plattsburgh Air Force Base,135267,135267,3310245
8,08/17/1996,Plattsburgh,Plattsburgh Air Force Base,135267,135267,3310245
9,10/21/1996,New York City,Madison Square Garden,34204,34204,857744


In [17]:
money_dfs[4]

Unnamed: 0,Date (1998),City,Venue,Attendance,Gross
0,July 21,"Virginia Beach, United States",GTE Virginia Beach Amphitheatre,"20,074 / 20,074","$486,775"
1,August 2,"George, United States",The Gorge Amphitheatre,"37,871 / 40,000","$1,023,129"
2,August 3,"George, United States",The Gorge Amphitheatre,"37,871 / 40,000","$1,023,129"
3,August 9,"East Troy, United States",Alpine Valley Music Theatre,"34,642 / 34,642","$866,202"
4,August 10,"Noblesville, United States",Deer Creek Music Center,"41,782 / 41,782","$1,044,762"
5,August 11,"Noblesville, United States",Deer Creek Music Center,"41,782 / 41,782","$1,044,762"
6,August 16,"Limestone, United States",Loring Air Force Base,"123,176 / 123,176","$4,337,184"
7,August 17,"Limestone, United States",Loring Air Force Base,"123,176 / 123,176","$4,337,184"
8,December 29,"New York City, United States",Madison Square Garden,"56,704 / 56,704","$1,583,886"
9,December 30,"New York City, United States",Madison Square Garden,"56,704 / 56,704","$1,583,886"


In [18]:
money_dfs_1997 = money_dfs[4]
money_dfs_1997['Gross'] = money_dfs_1997['Gross'].str.replace('$', '')

money_dfs_1997_split = money_dfs_1997['Attendance'].str.split('/', expand=True)
money_dfs_1997['Attendance'] = money_dfs_1997_split[0]
money_dfs_1997['Capacity'] = money_dfs_1997_split[1]

desired_column_order = ['Date (1998)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_1997_reordered = money_dfs_1997[desired_column_order]
money_dfs_1997 = money_dfs_1997_reordered

money_dfs_1997['City'] = money_dfs_1997['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=1997)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_1997['Date (1998)'] = money_dfs_1997['Date (1998)'].apply(convert_date)

money_dfs_1997 = money_dfs_1997.drop(11)
money_dfs_1997 = money_dfs_1997.reset_index(drop=True)

money_dfs_1997 = money_dfs_1997.rename(columns={'Date (1998)': 'Date'})

money_dfs[4] = money_dfs_1997


money_dfs[4]

  money_dfs_1997['Gross'] = money_dfs_1997['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,07/21/1997,Virginia Beach,GTE Virginia Beach Amphitheatre,20074,20074,486775
1,08/02/1997,George,The Gorge Amphitheatre,37871,40000,1023129
2,08/03/1997,George,The Gorge Amphitheatre,37871,40000,1023129
3,08/09/1997,East Troy,Alpine Valley Music Theatre,34642,34642,866202
4,08/10/1997,Noblesville,Deer Creek Music Center,41782,41782,1044762
5,08/11/1997,Noblesville,Deer Creek Music Center,41782,41782,1044762
6,08/16/1997,Limestone,Loring Air Force Base,123176,123176,4337184
7,08/17/1997,Limestone,Loring Air Force Base,123176,123176,4337184
8,12/29/1997,New York City,Madison Square Garden,56704,56704,1583886
9,12/30/1997,New York City,Madison Square Garden,56704,56704,1583886


In [19]:
money_dfs_1998 = money_dfs[5]
money_dfs_1998['Gross'] = money_dfs_1998['Gross'].str.replace('$', '')

money_dfs_1998_split = money_dfs_1998['Attendance'].str.split('/', expand=True)
money_dfs_1998['Attendance'] = money_dfs_1998_split[0]
money_dfs_1998['Capacity'] = money_dfs_1998_split[1]

desired_column_order = ['Date (1998)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_1998_reordered = money_dfs_1998[desired_column_order]
money_dfs_1998 = money_dfs_1998_reordered

money_dfs_1998['City'] = money_dfs_1998['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=1998)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_1998['Date (1998)'] = money_dfs_1998['Date (1998)'].apply(convert_date)

money_dfs_1998 = money_dfs_1998.drop(21)
money_dfs_1998 = money_dfs_1998.reset_index(drop=True)

money_dfs_1998 = money_dfs_1998.rename(columns={'Date (1998)': 'Date'})


money_dfs[5] = money_dfs_1998


money_dfs[5]

  money_dfs_1998['Gross'] = money_dfs_1998['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,04/02/1998,Uniondale,Nassau Veterans Memorial Coliseum,34348,34348,824328
1,04/03/1998,Uniondale,Nassau Veterans Memorial Coliseum,34348,34348,824328
2,07/16/1998,George,The Gorge Amphitheatre,31544,40000,854900
3,07/17/1998,George,The Gorge Amphitheatre,31544,40000,854900
4,08/02/1998,Noblesville,Deer Creek Music Center,42158,42158,1092811
5,08/03/1998,Noblesville,Deer Creek Music Center,42158,42158,1092811
6,08/15/1998,Limestone,Loring Air Force Base,105836,105836,4012715
7,08/16/1998,Limestone,Loring Air Force Base,105836,105836,4012715
8,10/30/1998,Las Vegas,Thomas & Mack Center,35635,35635,935485
9,10/31/1998,Las Vegas,Thomas & Mack Center,35635,35635,935485


In [20]:
money_dfs_1999 = money_dfs[6]
money_dfs_1999['Gross'] = money_dfs_1999['Gross'].str.replace('$', '')

money_dfs_1999_split = money_dfs_1999['Attendance'].str.split('/', expand=True)
money_dfs_1999['Attendance'] = money_dfs_1999_split[0]
money_dfs_1999['Capacity'] = money_dfs_1999_split[1]

desired_column_order = ['Date (1999)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_1999_reordered = money_dfs_1999[desired_column_order]
money_dfs_1999 = money_dfs_1999_reordered

money_dfs_1999['City'] = money_dfs_1999['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=1999)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_1999['Date (1999)'] = money_dfs_1999['Date (1999)'].apply(convert_date)

money_dfs_1999 = money_dfs_1999.drop(21)
money_dfs_1999 = money_dfs_1999.reset_index(drop=True)

money_dfs_1999 = money_dfs_1999.rename(columns={'Date (1999)': 'Date'})


money_dfs[6] = money_dfs_1999


money_dfs[6]

  money_dfs_1999['Gross'] = money_dfs_1999['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,07/03/1999,Atlanta,Lakewood Amphitheatre,37822,37822,1057431
1,07/04/1999,Atlanta,Lakewood Amphitheatre,37822,37822,1057431
2,07/17/1999,Volney,Oswego County Airport,101172,101172,3839730
3,07/18/1999,Volney,Oswego County Airport,101172,101172,3839730
4,07/25/1999,Noblesville,Deer Creek Music Center,41553,41553,1101155
5,07/26/1999,Noblesville,Deer Creek Music Center,41553,41553,1101155
6,09/10/1999,George,The Gorge Amphitheatre,29383,40000,849713
7,09/11/1999,George,The Gorge Amphitheatre,29383,40000,849713
8,10/03/1999,Rosemont,Allstate Arena,17963,17963,495065
9,10/07/1999,Uniondale,Nassau Veterans Memorial Coliseum,30977,36016,772341


In [21]:
money_dfs_2000 = money_dfs[7]
money_dfs_2000['Gross'] = money_dfs_2000['Gross'].str.replace('$', '')

money_dfs_2000_split = money_dfs_2000['Attendance'].str.split('/', expand=True)
money_dfs_2000['Attendance'] = money_dfs_2000_split[0]
money_dfs_2000['Capacity'] = money_dfs_2000_split[1]

desired_column_order = ['Date (2000)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2000_reordered = money_dfs_2000[desired_column_order]
money_dfs_2000 = money_dfs_2000_reordered

money_dfs_2000['City'] = money_dfs_2000['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2000)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2000['Date (2000)'] = money_dfs_2000['Date (2000)'].apply(convert_date)

money_dfs_2000 = money_dfs_2000.drop(8)
money_dfs_2000 = money_dfs_2000.reset_index(drop=True)

money_dfs_2000 = money_dfs_2000.rename(columns={'Date (2000)': 'Date'})


money_dfs[7] = money_dfs_2000


money_dfs[7]

  money_dfs_2000['Gross'] = money_dfs_2000['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,07/10/2000,Noblesville,Deer Creek Music Center,74212,74212,2040888
1,07/11/2000,Noblesville,Deer Creek Music Center,74212,74212,2040888
2,07/12/2000,Noblesville,Deer Creek Music Center,74212,74212,2040888
3,09/15/2000,Hershey,Hersheypark Stadium,30034,30034,847505
4,09/22/2000,Rosemont,Allstate Arena,36447,36447,1011582
5,09/23/2000,Rosemont,Allstate Arena,36447,36447,1011582
6,09/29/2000,Las Vegas,Thomas & Mack Center,35585,36500,978588
7,09/30/2000,Las Vegas,Thomas & Mack Center,35585,36500,978588


In [22]:
money_dfs_2002 = money_dfs[8]
money_dfs_2002['Gross'] = money_dfs_2002['Gross'].str.replace('$', '')

money_dfs_2002_split = money_dfs_2002['Attendance'].str.split('/', expand=True)
money_dfs_2002['Attendance'] = money_dfs_2002_split[0]
money_dfs_2002['Capacity'] = money_dfs_2002_split[1]

desired_column_order = ['Date (2002)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2002_reordered = money_dfs_2002[desired_column_order]
money_dfs_2002 = money_dfs_2002_reordered

money_dfs_2002['City'] = money_dfs_2002['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2002)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2002['Date (2002)'] = money_dfs_2002['Date (2002)'].apply(convert_date)

money_dfs_2002 = money_dfs_2002.drop(0)
money_dfs_2002 = money_dfs_2002.reset_index(drop=True)

money_dfs_2002 = money_dfs_2002.rename(columns={'Date (2002)': 'Date'})


money_dfs[8] = money_dfs_2002


money_dfs[8]

  money_dfs_2002['Gross'] = money_dfs_2002['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2002['City'] = money_dfs_2002['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2002['Date (2002)'] = money_dfs_2002['Date (2002)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,12/31/2002,New York City,Madison Square Garden,18966,18966,824940


In [23]:
money_dfs_2003 = money_dfs[9]
money_dfs_2003['Gross'] = money_dfs_2003['Gross'].str.replace('$', '')

money_dfs_2003_split = money_dfs_2003['Attendance'].str.split('/', expand=True)
money_dfs_2003['Attendance'] = money_dfs_2003_split[0]
money_dfs_2003['Capacity'] = money_dfs_2003_split[1]

desired_column_order = ['Date (2003)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2003_reordered = money_dfs_2003[desired_column_order]
money_dfs_2003 = money_dfs_2003_reordered

money_dfs_2003['City'] = money_dfs_2003['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2003)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2003['Date (2003)'] = money_dfs_2003['Date (2003)'].apply(convert_date)

money_dfs_2003 = money_dfs_2003.drop(0)
money_dfs_2003 = money_dfs_2003.drop(4)
money_dfs_2003 = money_dfs_2003.drop(17)
money_dfs_2003 = money_dfs_2003.drop(37)
money_dfs_2003 = money_dfs_2003.drop(40)
money_dfs_2003 = money_dfs_2003.drop(45)
money_dfs_2003 = money_dfs_2003.drop(50)
money_dfs_2003 = money_dfs_2003.reset_index(drop=True)

money_dfs_2003 = money_dfs_2003.rename(columns={'Date (2003)': 'Date'})

money_dfs_2003 = money_dfs_2003.fillna(0)

money_dfs[9] = money_dfs_2003


money_dfs[9]



  money_dfs_2003['Gross'] = money_dfs_2003['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2003['City'] = money_dfs_2003['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2003['Date (2003)'] = money_dfs_2003['Date (2003)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,01/02/2003,Hampton,Hampton Coliseum,41400,41400,1559173
1,01/03/2003,Hampton,Hampton Coliseum,41400,41400,1559173
2,01/04/2003,Hampton,Hampton Coliseum,41400,41400,1559173
3,02/14/2003,Inglewood,Great Western Forum,17436,17517,645863
4,02/15/2003,Las Vegas,Thomas & Mack Center,35905,35905,1418248
5,02/16/2003,Las Vegas,Thomas & Mack Center,35905,35905,1418248
6,02/18/2003,Denver,Pepsi Center,17767,17767,666263
7,02/20/2003,Rosemont,Allstate Arena,18355,18355,688313
8,02/21/2003,Cincinnati,U.S. Bank Arena,0,0,0
9,02/22/2003,Cincinnati,U.S. Bank Arena,0,0,0


In [24]:
money_dfs_2004 = money_dfs[10]
money_dfs_2004['Gross'] = money_dfs_2004['Gross'].str.replace('$', '')

money_dfs_2004_split = money_dfs_2004['Attendance'].str.split('/', expand=True)
money_dfs_2004['Attendance'] = money_dfs_2004_split[0]
money_dfs_2004['Capacity'] = money_dfs_2004_split[1]

desired_column_order = ['Date (2004)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2004_reordered = money_dfs_2004[desired_column_order]
money_dfs_2004 = money_dfs_2004_reordered

money_dfs_2004['City'] = money_dfs_2004['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2004)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2004['Date (2004)'] = money_dfs_2004['Date (2004)'].apply(convert_date)

money_dfs_2004 = money_dfs_2004.drop(0)
money_dfs_2004 = money_dfs_2004.drop(4)
money_dfs_2004 = money_dfs_2004.drop(13)
money_dfs_2004 = money_dfs_2004.drop(18)
money_dfs_2004 = money_dfs_2004.drop(22)
money_dfs_2004 = money_dfs_2004.reset_index(drop=True)

money_dfs_2004 = money_dfs_2004.rename(columns={'Date (2004)': 'Date'})

money_dfs_2004 = money_dfs_2004.fillna(0)

money_dfs[10] = money_dfs_2004


money_dfs[10] 

  money_dfs_2004['Gross'] = money_dfs_2004['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2004['City'] = money_dfs_2004['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2004['Date (2004)'] = money_dfs_2004['Date (2004)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,04/15/2004,Las Vegas,Thomas & Mack Center,53815,53815,2287138
1,04/16/2004,Las Vegas,Thomas & Mack Center,53815,53815,2287138
2,04/17/2004,Las Vegas,Thomas & Mack Center,53815,53815,2287138
3,06/17/2004,Brooklyn,KeySpan Park,0,0,0
4,06/18/2004,Brooklyn,KeySpan Park,0,0,0
5,06/19/2004,Saratoga Springs,Saratoga Performing Arts Center,50081,50240,2082458
6,06/20/2004,Saratoga Springs,Saratoga Performing Arts Center,50081,50240,2082458
7,06/23/2004,Noblesville,Verizon Wireless Music Center,48607,48607,1902574
8,06/24/2004,Noblesville,Verizon Wireless Music Center,48607,48607,1902574
9,06/25/2004,East Troy,Alpine Valley Music Theatre,64969,70093,2543022


In [25]:
money_dfs_2009 = money_dfs[11]
money_dfs_2009['Gross'] = money_dfs_2009['Gross'].str.replace('$', '')

money_dfs_2009_split = money_dfs_2009['Attendance'].str.split('/', expand=True)
money_dfs_2009['Attendance'] = money_dfs_2009_split[0]
money_dfs_2009['Capacity'] = money_dfs_2009_split[1]

desired_column_order = ['Date (2009)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2009_reordered = money_dfs_2009[desired_column_order]
money_dfs_2009 = money_dfs_2009_reordered

money_dfs_2009['City'] = money_dfs_2009['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2009)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2009['Date (2009)'] = money_dfs_2009['Date (2009)'].apply(convert_date)

money_dfs_2009 = money_dfs_2009.drop(0)
money_dfs_2009 = money_dfs_2009.drop(4)
money_dfs_2009 = money_dfs_2009.drop(20)
money_dfs_2009 = money_dfs_2009.drop(33)
money_dfs_2009 = money_dfs_2009.drop(37)
money_dfs_2009 = money_dfs_2009.drop(51)
money_dfs_2009 = money_dfs_2009.drop(56)
money_dfs_2009 = money_dfs_2009.reset_index(drop=True)

money_dfs_2009 = money_dfs_2009.rename(columns={'Date (2009)': 'Date'})

money_dfs_2009 = money_dfs_2009.fillna(0)

money_dfs[11] = money_dfs_2009


money_dfs[11] 

  money_dfs_2009['Gross'] = money_dfs_2009['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2009['City'] = money_dfs_2009['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2009['Date (2009)'] = money_dfs_2009['Date (2009)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,03/06/2009,Hampton,Hampton Coliseum,0,0,0
1,03/07/2009,Hampton,Hampton Coliseum,0,0,0
2,03/08/2009,Hampton,Hampton Coliseum,0,0,0
3,05/31/2009,Boston,Fenway Park,34906,34906,1710423
4,06/02/2009,Wantagh,Nikon at Jones Beach Theater,0,0,0
5,06/04/2009,Wantagh,Nikon at Jones Beach Theater,0,0,0
6,06/05/2009,Wantagh,Nikon at Jones Beach Theater,0,0,0
7,06/06/2009,Mansfield,Comcast Center,0,0,0
8,06/07/2009,Camden,Susquehanna Bank Center,24958,24958,1232116
9,06/09/2009,Asheville,Asheville Civic Center,0,0,0


In [26]:
money_dfs_2010 = money_dfs[12]
money_dfs_2010['Gross'] = money_dfs_2010['Gross'].str.replace('$', '')

money_dfs_2010_split = money_dfs_2010['Attendance'].str.split('/', expand=True)
money_dfs_2010['Attendance'] = money_dfs_2010_split[0]
money_dfs_2010['Capacity'] = money_dfs_2010_split[1]

desired_column_order = ['Date (2010–11)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2010_reordered = money_dfs_2010[desired_column_order]
money_dfs_2010 = money_dfs_2010_reordered

money_dfs_2010['City'] = money_dfs_2010['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2010)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2010['Date (2010–11)'] = money_dfs_2010['Date (2010–11)'].apply(convert_date)

money_dfs_2010 = money_dfs_2010.drop(0)
money_dfs_2010 = money_dfs_2010.drop(19)
money_dfs_2010 = money_dfs_2010.drop(31)
money_dfs_2010 = money_dfs_2010.drop(47)
money_dfs_2010 = money_dfs_2010.drop(53)
money_dfs_2010 = money_dfs_2010.reset_index(drop=True)

money_dfs_2010 = money_dfs_2010.rename(columns={'Date (2010–11)': 'Date'})

new_date = "01/01/2011"
money_dfs_2010.at[48, 'Date'] = new_date

money_dfs_2010 = money_dfs_2010.fillna(0)

money_dfs[12] = money_dfs_2010

money_dfs[12]


  money_dfs_2010['Gross'] = money_dfs_2010['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2010['City'] = money_dfs_2010['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2010['Date (2010–11)'] = money_dfs_2010['Date (2010–11)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,06/11/2010,Bridgeview,Toyota Park,22293,22293,1036625
1,06/12/2010,Cuyahoga Falls,Blossom Music Center,14726,20351,736300
2,06/13/2010,Hershey,Hersheypark Stadium,14261,30223,713050
3,06/15/2010,Portsmouth,nTelos Wireless Pavilion,0,0,0
4,06/17/2010,Hartford,Comcast Theatre,32610,49608,1900500
5,06/18/2010,Hartford,Comcast Theatre,32610,49608,1900500
6,06/19/2010,Saratoga Springs,Saratoga Performing Arts Center,45176,50157,2258800
7,06/20/2010,Saratoga Springs,Saratoga Performing Arts Center,45176,50157,2258800
8,06/22/2010,Mansfield,Comcast Center,19729,19729,986450
9,06/24/2010,Camden,Susquehanna Bank Center,37247,49440,1965934


In [27]:
money_dfs_2011 = money_dfs[13]
money_dfs_2011['Gross'] = money_dfs_2011['Gross'].str.replace('$', '')

money_dfs_2011_split = money_dfs_2011['Attendance'].str.split('/', expand=True)
money_dfs_2011['Attendance'] = money_dfs_2011_split[0]
money_dfs_2011['Capacity'] = money_dfs_2011_split[1]

desired_column_order = ['Date (2011)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2011_reordered = money_dfs_2011[desired_column_order]
money_dfs_2011 = money_dfs_2011_reordered

money_dfs_2011['City'] = money_dfs_2011['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2011)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2011['Date (2011)'] = money_dfs_2011['Date (2011)'].apply(convert_date)

money_dfs_2011 = money_dfs_2011.drop(10)
money_dfs_2011 = money_dfs_2011.reset_index(drop=True)

money_dfs_2011 = money_dfs_2011.rename(columns={'Date (2011)': 'Date'})

money_dfs[13] = money_dfs_2011

money_dfs[13]

  money_dfs_2011['Gross'] = money_dfs_2011['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,06/03/2011,Clarkston,DTE Energy Music Theatre,11233,15274,557283
1,08/09/2011,Stateline,Harveys Outdoor Arena,17221,17221,861050
2,08/10/2011,Stateline,Harveys Outdoor Arena,17221,17221,861050
3,08/15/2011,Chicago,UIC Pavilion,27476,27476,1593608
4,08/16/2011,Chicago,UIC Pavilion,27476,27476,1593608
5,08/17/2011,Chicago,UIC Pavilion,27476,27476,1593608
6,12/28/2011,New York City,Madison Square Garden,75707,75707,4387679
7,12/29/2011,New York City,Madison Square Garden,75707,75707,4387679
8,12/30/2011,New York City,Madison Square Garden,75707,75707,4387679
9,12/31/2011,New York City,Madison Square Garden,75707,75707,4387679


In [28]:
money_dfs_2012 = money_dfs[14]
money_dfs_2012['Gross'] = money_dfs_2012['Gross'].str.replace('$', '')

money_dfs_2012_split = money_dfs_2012['Attendance'].str.split('/', expand=True)
money_dfs_2012['Attendance'] = money_dfs_2012_split[0]
money_dfs_2012['Capacity'] = money_dfs_2012_split[1]

desired_column_order = ['Date (2012)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2012_reordered = money_dfs_2012[desired_column_order]
money_dfs_2012 = money_dfs_2012_reordered

money_dfs_2012['City'] = money_dfs_2012['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2012)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2012['Date (2012)'] = money_dfs_2012['Date (2012)'].apply(convert_date)

money_dfs_2012 = money_dfs_2012.drop(0)
money_dfs_2012 = money_dfs_2012.drop(21)
money_dfs_2012 = money_dfs_2012.drop(35)
money_dfs_2012 = money_dfs_2012.drop(40)
money_dfs_2012 = money_dfs_2012.reset_index(drop=True)

money_dfs_2012 = money_dfs_2012.rename(columns={'Date (2012)': 'Date'})

money_dfs_2012 = money_dfs_2012.fillna(0)

money_dfs[14] = money_dfs_2012

money_dfs[14]

  money_dfs_2012['Gross'] = money_dfs_2012['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2012['City'] = money_dfs_2012['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2012['Date (2012)'] = money_dfs_2012['Date (2012)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,06/07/2012,Worcester,DCU Center,25346,28666,1520760
1,06/08/2012,Worcester,DCU Center,25346,28666,1520760
2,06/10/2012,Manchester,Bonnaroo Music and Arts Festival,0,0,0
3,06/15/2012,Atlantic City,Bader Field,0,0,0
4,06/16/2012,Atlantic City,Bader Field,0,0,0
5,06/17/2012,Atlantic City,Bader Field,0,0,0
6,06/19/2012,Portsmouth,nTelos Wireless Pavilion,13780,13780,827400
7,06/20/2012,Portsmouth,nTelos Wireless Pavilion,13780,13780,827400
8,06/22/2012,Cincinnati,Riverbend Music Center,11075,20500,581400
9,06/23/2012,Burgettstown,First Niagara Pavilion,12925,23085,683220


In [29]:
money_dfs_2013 = money_dfs[15]
money_dfs_2013['Gross'] = money_dfs_2013['Gross'].str.replace('$', '')

money_dfs_2013_split = money_dfs_2013['Attendance'].str.split('/', expand=True)
money_dfs_2013['Attendance'] = money_dfs_2013_split[0]
money_dfs_2013['Capacity'] = money_dfs_2013_split[1]

desired_column_order = ['Date (2013)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2013_reordered = money_dfs_2013[desired_column_order]
money_dfs_2013 = money_dfs_2013_reordered

money_dfs_2013['City'] = money_dfs_2013['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2013)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2013['Date (2013)'] = money_dfs_2013['Date (2013)'].apply(convert_date)

money_dfs_2013 = money_dfs_2013.drop(0)
money_dfs_2013 = money_dfs_2013.drop(26)
money_dfs_2013 = money_dfs_2013.drop(39)
money_dfs_2013 = money_dfs_2013.drop(44)
money_dfs_2013 = money_dfs_2013.reset_index(drop=True)

money_dfs_2013 = money_dfs_2013.rename(columns={'Date (2013)': 'Date'})

money_dfs[15] = money_dfs_2013

money_dfs[15] 

  money_dfs_2013['Gross'] = money_dfs_2013['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2013['City'] = money_dfs_2013['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2013['Date (2013)'] = money_dfs_2013['Date (2013)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,07/03/2013,Bangor,Darling's Waterfront Pavilion,13977,16000,840455
1,07/05/2013,Saratoga Springs,Saratoga Performing Arts Center,66695,77867,2324855
2,07/06/2013,Saratoga Springs,Saratoga Performing Arts Center,66695,77867,2324855
3,07/07/2013,Saratoga Springs,Saratoga Performing Arts Center,66695,77867,2324855
4,07/10/2013,Holmdel Township,PNC Bank Arts Center,16720,16907,847395
5,07/12/2013,Wantagh,Nikon at Jones Beach Theater,14252,14252,855120
6,07/13/2013,Columbia,Merriweather Post Pavilion,35103,39124,1741095
7,07/14/2013,Columbia,Merriweather Post Pavilion,35103,39124,1741095
8,07/16/2013,Alpharetta,Verizon Wireless Amphitheatre at Encore Park,23245,26000,1266060
9,07/17/2013,Alpharetta,Verizon Wireless Amphitheatre at Encore Park,23245,26000,1266060


In [30]:
money_dfs_2014 = money_dfs[16]
money_dfs_2014['Gross'] = money_dfs_2014['Gross'].str.replace('$', '')

money_dfs_2014_split = money_dfs_2014['Attendance'].str.split('/', expand=True)
money_dfs_2014['Attendance'] = money_dfs_2014_split[0]
money_dfs_2014['Capacity'] = money_dfs_2014_split[1]

desired_column_order = ['Date (2014)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2014_reordered = money_dfs_2014[desired_column_order]
money_dfs_2014 = money_dfs_2014_reordered

money_dfs_2014['City'] = money_dfs_2014['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2014)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2014['Date (2014)'] = money_dfs_2014['Date (2014)'].apply(convert_date)

money_dfs_2014 = money_dfs_2014.drop(0)
money_dfs_2014 = money_dfs_2014.drop(2)
money_dfs_2014 = money_dfs_2014.drop(28)
money_dfs_2014 = money_dfs_2014.drop(41)
money_dfs_2014 = money_dfs_2014.drop(43)
money_dfs_2014 = money_dfs_2014.reset_index(drop=True)

money_dfs_2014 = money_dfs_2014.rename(columns={'Date (2014)': 'Date'})

money_dfs_2014 = money_dfs_2014.fillna(0)

money_dfs[16] = money_dfs_2014

money_dfs[16]

  money_dfs_2014['Gross'] = money_dfs_2014['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2014['City'] = money_dfs_2014['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2014['Date (2014)'] = money_dfs_2014['Date (2014)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,04/26/2014,New Orleans,New Orleans Jazz & Heritage Festival,0,0,0
1,07/01/2014,Mansfield,Xfinity Center,17387,19900,971325
2,07/03/2014,Saratoga Springs,Saratoga Performing Arts Center,52730,75759,2602185
3,07/04/2014,Saratoga Springs,Saratoga Performing Arts Center,52730,75759,2602185
4,07/05/2014,Saratoga Springs,Saratoga Performing Arts Center,52730,75759,2602185
5,07/08/2014,Philadelphia,Mann Center for the Performing Arts,24804,25000,1308840
6,07/09/2014,Philadelphia,Mann Center for the Performing Arts,24804,25000,1308840
7,07/11/2014,New York City,Randall's Island,55372,90000,3062580
8,07/12/2014,New York City,Randall's Island,55372,90000,3062580
9,07/13/2014,New York City,Randall's Island,55372,90000,3062580


In [31]:
money_dfs_2015 = money_dfs[17]
money_dfs_2015['Gross'] = money_dfs_2015['Gross'].str.replace('$', '')

money_dfs_2015_split = money_dfs_2015['Attendance'].str.split('/', expand=True)
money_dfs_2015['Attendance'] = money_dfs_2015_split[0]
money_dfs_2015['Capacity'] = money_dfs_2015_split[1]

desired_column_order = ['Date (2015)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2015_reordered = money_dfs_2015[desired_column_order]
money_dfs_2015 = money_dfs_2015_reordered

money_dfs_2015['City'] = money_dfs_2015['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2015)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2015['Date (2015)'] = money_dfs_2015['Date (2015)'].apply(convert_date)

money_dfs_2015 = money_dfs_2015.drop(0)
money_dfs_2015 = money_dfs_2015.drop(4)
money_dfs_2015 = money_dfs_2015.drop(30)
money_dfs_2015 = money_dfs_2015.drop(33)
money_dfs_2015 = money_dfs_2015.reset_index(drop=True)

money_dfs_2015 = money_dfs_2015.rename(columns={'Date (2015)': 'Date'})

money_dfs_2015 = money_dfs_2015.fillna(0)

money_dfs[17] = money_dfs_2015

money_dfs[17]

  money_dfs_2015['Gross'] = money_dfs_2015['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2015['City'] = money_dfs_2015['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2015['Date (2015)'] = money_dfs_2015['Date (2015)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,01/01/2015,Miami,AmericanAirlines Arena,0,0,0
1,01/02/2015,Miami,AmericanAirlines Arena,0,0,0
2,01/03/2015,Miami,AmericanAirlines Arena,0,0,0
3,07/21/2015,Bend,Les Schwab Amphitheater,15999,15999,1039935
4,07/22/2015,Bend,Les Schwab Amphitheater,15999,15999,1039935
5,07/24/2015,Mountain View,Shoreline Amphitheatre,15173,22000,805845
6,07/25/2015,Inglewood,The Forum,12388,14550,715185
7,07/28/2015,Austin,Austin360 Amphitheater,10170,13164,601710
8,07/29/2015,Grand Prairie,Verizon Theatre,6455,6631,419575
9,07/31/2015,Atlanta,Aaron's Amphitheatre at Lakewood,26451,37736,1449755


In [32]:
money_dfs_2016 = money_dfs[18]
money_dfs_2016['Gross'] = money_dfs_2016['Gross'].str.replace('$', '')

money_dfs_2016_split = money_dfs_2016['Attendance'].str.split('/', expand=True)
money_dfs_2016['Attendance'] = money_dfs_2016_split[0]
money_dfs_2016['Capacity'] = money_dfs_2016_split[1]

desired_column_order = ['Date (2016)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2016_reordered = money_dfs_2016[desired_column_order]
money_dfs_2016 = money_dfs_2016_reordered

money_dfs_2016['City'] = money_dfs_2016['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2016)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2016['Date (2016)'] = money_dfs_2016['Date (2016)'].apply(convert_date)

money_dfs_2016 = money_dfs_2016.drop(0)
money_dfs_2016 = money_dfs_2016.drop(3)
money_dfs_2016 = money_dfs_2016.drop(7)
money_dfs_2016 = money_dfs_2016.drop(31)
money_dfs_2016 = money_dfs_2016.drop(45)
money_dfs_2016 = money_dfs_2016.drop(50)
money_dfs_2016 = money_dfs_2016.reset_index(drop=True)

money_dfs_2016 = money_dfs_2016.rename(columns={'Date (2016)': 'Date'})

money_dfs_2016 = money_dfs_2016.fillna(0)

money_dfs[18] = money_dfs_2016

money_dfs[18]

  money_dfs_2016['Gross'] = money_dfs_2016['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2016['City'] = money_dfs_2016['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2016['Date (2016)'] = money_dfs_2016['Date (2016)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,01/01/2016,New York City,Madison Square Garden,0,0,0
1,01/02/2016,New York City,Madison Square Garden,0,0,0
2,01/15/2016,Playa del Carmen,Barceló Maya Beach,0,0,0
3,01/16/2016,Playa del Carmen,Barceló Maya Beach,0,0,0
4,01/17/2016,Playa del Carmen,Barceló Maya Beach,0,0,0
5,06/22/2016,Saint Paul,Xcel Energy Center,0,0,0
6,06/24/2016,Chicago,Wrigley Field,83588,84356,4761063
7,06/25/2016,Chicago,Wrigley Field,83588,84356,4761063
8,06/26/2016,Noblesville,Klipsch Music Center,17865,24369,738703
9,06/28/2016,Philadelphia,Mann Center for the Performing Arts,24852,25160,1374580


In [33]:
money_dfs_2017 = money_dfs[19]
money_dfs_2017['Gross'] = money_dfs_2017['Gross'].str.replace('$', '')

money_dfs_2017_split = money_dfs_2017['Attendance'].str.split('/', expand=True)
money_dfs_2017['Attendance'] = money_dfs_2017_split[0]
money_dfs_2017['Capacity'] = money_dfs_2017_split[1]

desired_column_order = ['Date (2017)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2017_reordered = money_dfs_2017[desired_column_order]
money_dfs_2017 = money_dfs_2017_reordered

money_dfs_2017['City'] = money_dfs_2017['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2017)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2017['Date (2017)'] = money_dfs_2017['Date (2017)'].apply(convert_date)

money_dfs_2017 = money_dfs_2017.drop(0)
money_dfs_2017 = money_dfs_2017.drop(4)
money_dfs_2017 = money_dfs_2017.drop(10)
money_dfs_2017 = money_dfs_2017.drop(24)
money_dfs_2017 = money_dfs_2017.drop(28)
money_dfs_2017 = money_dfs_2017.drop(33)
money_dfs_2017 = money_dfs_2017.reset_index(drop=True)

money_dfs_2017 = money_dfs_2017.rename(columns={'Date (2017)': 'Date'})

money_dfs_2017 = money_dfs_2017.fillna(0)

money_dfs[19] = money_dfs_2017

money_dfs[19]

  money_dfs_2017['Gross'] = money_dfs_2017['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2017['City'] = money_dfs_2017['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2017['Date (2017)'] = money_dfs_2017['Date (2017)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,01/13/2017,Playa del Carmen,Barceló Maya Beach,0,0,0
1,01/14/2017,Playa del Carmen,Barceló Maya Beach,0,0,0
2,01/15/2017,Playa del Carmen,Barceló Maya Beach,0,0,0
3,07/14/2017,Chicago,Huntington Bank Pavilion at Northerly Island,49817,78174,2306566
4,07/15/2017,Chicago,Huntington Bank Pavilion at Northerly Island,49817,78174,2306566
5,07/16/2017,Chicago,Huntington Bank Pavilion at Northerly Island,49817,78174,2306566
6,07/18/2017,Fairborn,Wright State University Nutter Center,11266,11295,679471
7,07/19/2017,Pittsburgh,Petersen Events Center,10375,12224,562947
8,07/21/2017,New York City,Madison Square Garden,227385,236278,15041405
9,07/22/2017,New York City,Madison Square Garden,227385,236278,15041405


In [34]:
money_dfs_2018 = money_dfs[20]
money_dfs_2018['Gross'] = money_dfs_2018['Gross'].str.replace('$', '')

money_dfs_2018_split = money_dfs_2018['Attendance'].str.split('/', expand=True)
money_dfs_2018['Attendance'] = money_dfs_2018_split[0]
money_dfs_2018['Capacity'] = money_dfs_2018_split[1]

desired_column_order = ['Date (2018)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2018_reordered = money_dfs_2018[desired_column_order]
money_dfs_2018 = money_dfs_2018_reordered

money_dfs_2018['City'] = money_dfs_2018['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2018)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2018['Date (2018)'] = money_dfs_2018['Date (2018)'].apply(convert_date)

money_dfs_2018 = money_dfs_2018.drop(0)
money_dfs_2018 = money_dfs_2018.drop(25)
money_dfs_2018 = money_dfs_2018.drop(40)
money_dfs_2018 = money_dfs_2018.drop(45)
money_dfs_2018 = money_dfs_2018.reset_index(drop=True)

money_dfs_2018 = money_dfs_2018.rename(columns={'Date (2018)': 'Date'})

money_dfs_2018 = money_dfs_2018.fillna(0)

money_dfs[20] = money_dfs_2018

money_dfs[20]

  money_dfs_2018['Gross'] = money_dfs_2018['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2018['City'] = money_dfs_2018['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2018['Date (2018)'] = money_dfs_2018['Date (2018)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,07/17/2018,Stateline,Lake Tahoe Outdoor Arena,17150,17150,1269027
1,07/18/2018,Stateline,Lake Tahoe Outdoor Arena,17150,17150,1269027
2,07/20/2018,George,The Gorge Amphitheatre,0,0,0
3,07/21/2018,George,The Gorge Amphitheatre,0,0,0
4,07/22/2018,George,The Gorge Amphitheatre,0,0,0
5,07/24/2018,San Francisco,Bill Graham Civic Auditorium,17507,17507,1399840
6,07/25/2018,San Francisco,Bill Graham Civic Auditorium,17507,17507,1399840
7,07/27/2018,Inglewood,The Forum,23482,23482,1642872
8,07/28/2018,Inglewood,The Forum,23482,23482,1642872
9,07/31/2018,Austin,Austin360 Amphitheater,0,0,0


In [35]:
money_dfs_2019 = money_dfs[21]
money_dfs_2019['Gross'] = money_dfs_2019['Gross'].str.replace('$', '')

money_dfs_2019_split = money_dfs_2019['Attendance'].str.split('/', expand=True)
money_dfs_2019['Attendance'] = money_dfs_2019_split[0]
money_dfs_2019['Capacity'] = money_dfs_2019_split[1]

desired_column_order = ['Date (2019)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2019_reordered = money_dfs_2019[desired_column_order]
money_dfs_2019 = money_dfs_2019_reordered

money_dfs_2019['City'] = money_dfs_2019['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2019)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2019['Date (2019)'] = money_dfs_2019['Date (2019)'].apply(convert_date)

money_dfs_2019 = money_dfs_2019.drop(0)
money_dfs_2019 = money_dfs_2019.drop(4)
money_dfs_2019 = money_dfs_2019.drop(31)
money_dfs_2019 = money_dfs_2019.drop(39)
money_dfs_2019 = money_dfs_2019.drop(44)
money_dfs_2019 = money_dfs_2019.reset_index(drop=True)

money_dfs_2019 = money_dfs_2019.rename(columns={'Date (2019)': 'Date'})

money_dfs_2019 = money_dfs_2019.fillna(0)

money_dfs[21] = money_dfs_2019

money_dfs[21]

  money_dfs_2019['Gross'] = money_dfs_2019['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2019['City'] = money_dfs_2019['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2019['Date (2019)'] = money_dfs_2019['Date (2019)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,02/21/2019,Playa del Carmen,Barceló Maya Beach,0,0,0
1,02/22/2019,Playa del Carmen,Barceló Maya Beach,0,0,0
2,02/23/2019,Playa del Carmen,Barceló Maya Beach,0,0,0
3,06/11/2019,St. Louis,Chaifetz Arena,17464,20601,1215751
4,06/12/2019,St. Louis,Chaifetz Arena,17464,20601,1215751
5,06/14/2019,Manchester,Bonnaroo Music and Arts Festival,0,0,0
6,06/16/2019,Manchester,Bonnaroo Music and Arts Festival,0,0,0
7,06/18/2019,Toronto,Budweiser Stage,0,0,0
8,06/19/2019,Cuyahoga Falls,Blossom Music Center,0,0,0
9,06/21/2019,Charlotte,PNC Music Pavilion,0,0,0


In [36]:
money_dfs_2020 = money_dfs[22]

desired_column_order = ['Date (2020)', 'City', 'Venue', 'Attendance', 'Gross']
money_dfs_2020_reordered = money_dfs_2020[desired_column_order]
money_dfs_2020 = money_dfs_2020_reordered

money_dfs_2020['City'] = money_dfs_2020['City'].str.replace(', Mexico', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2020)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2020['Date (2020)'] = money_dfs_2020['Date (2020)'].apply(convert_date)

money_dfs_2020 = money_dfs_2020.drop(0)
money_dfs_2020 = money_dfs_2020.reset_index(drop=True)

money_dfs_2020 = money_dfs_2020.rename(columns={'Date (2020)': 'Date'})

money_dfs_2020['Capacity'] = [None, None, None, None]

money_dfs_2020 = money_dfs_2020.fillna(0)

money_dfs[22] = money_dfs_2020

money_dfs[22]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2020['City'] = money_dfs_2020['City'].str.replace(', Mexico', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2020['Date (2020)'] = money_dfs_2020['Date (2020)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Gross,Capacity
0,02/20/2020,Cancún,Moon Palace Resort,0,0,0
1,02/21/2020,Cancún,Moon Palace Resort,0,0,0
2,02/22/2020,Cancún,Moon Palace Resort,0,0,0
3,02/23/2020,Cancún,Moon Palace Resort,0,0,0


In [37]:
money_dfs_2021 = money_dfs[23]
money_dfs_2021['Gross'] = money_dfs_2021['Gross'].str.replace('$', '')

money_dfs_2021_split = money_dfs_2021['Attendance'].str.split('/', expand=True)
money_dfs_2021['Attendance'] = money_dfs_2021_split[0]
money_dfs_2021['Capacity'] = money_dfs_2021_split[1]

desired_column_order = ['Date (2021)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2021_reordered = money_dfs_2021[desired_column_order]
money_dfs_2021 = money_dfs_2021_reordered

money_dfs_2021['City'] = money_dfs_2021['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2021)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2021['Date (2021)'] = money_dfs_2021['Date (2021)'].apply(convert_date)

money_dfs_2021 = money_dfs_2021.drop(0)
money_dfs_2021 = money_dfs_2021.drop(23)
money_dfs_2021 = money_dfs_2021.drop(37)
money_dfs_2021 = money_dfs_2021.drop(39)
money_dfs_2021 = money_dfs_2021.reset_index(drop=True)

money_dfs_2021 = money_dfs_2021.rename(columns={'Date (2021)': 'Date'})

money_dfs_2021 = money_dfs_2021.fillna(0)

money_dfs[23] = money_dfs_2021

money_dfs[23]

  money_dfs_2021['Gross'] = money_dfs_2021['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2021['City'] = money_dfs_2021['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2021['Date (2021)'] = money_dfs_2021['Date (2021)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,07/28/2021,Rogers,Walmart Arkansas Music Pavilion,0,0,0
1,07/30/2021,Pelham,Oak Mountain Amphitheatre,0,0,0
2,07/31/2021,Alpharetta,Ameris Bank Amphitheatre,25924,25924,1908306
3,08/01/2021,Alpharetta,Ameris Bank Amphitheatre,25924,25924,1908306
4,08/03/2021,Nashville,Ascend Amphitheater,0,0,0
5,08/04/2021,Nashville,Ascend Amphitheater,0,0,0
6,08/06/2021,Noblesville,Ruoff Music Center,70100,74652,4036443
7,08/07/2021,Noblesville,Ruoff Music Center,70100,74652,4036443
8,08/08/2021,Noblesville,Ruoff Music Center,70100,74652,4036443
9,08/10/2021,Hershey,Hersheypark Stadium,41703,54678,3385967


In [38]:
money_dfs_2022 = money_dfs[24]
money_dfs_2022['Gross'] = money_dfs_2022['Gross'].str.replace('$', '')

money_dfs_2022_split = money_dfs_2022['Attendance'].str.split('/', expand=True)
money_dfs_2022['Attendance'] = money_dfs_2022_split[0]
money_dfs_2022['Capacity'] = money_dfs_2022_split[1]

desired_column_order = ['Date (2022)', 'City', 'Venue', 'Attendance', 'Capacity', 'Gross']
money_dfs_2022_reordered = money_dfs_2022[desired_column_order]
money_dfs_2022 = money_dfs_2022_reordered

money_dfs_2022['City'] = money_dfs_2022['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2022)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2022['Date (2022)'] = money_dfs_2022['Date (2022)'].apply(convert_date)

money_dfs_2022 = money_dfs_2022.drop(15)
money_dfs_2022 = money_dfs_2022.reset_index(drop=True)

money_dfs_2022 = money_dfs_2022.rename(columns={'Date (2022)': 'Date'})

money_dfs_2022 = money_dfs_2022.fillna(0)

money_dfs[24] = money_dfs_2022

money_dfs[24]

  money_dfs_2022['Gross'] = money_dfs_2022['Gross'].str.replace('$', '')


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,02/24/2022,"Cancún, Mexico",Moon Palace Resort,2423,2423,17710239
1,02/25/2022,"Cancún, Mexico",Moon Palace Resort,2423,2423,17710239
2,02/26/2022,"Cancún, Mexico",Moon Palace Resort,2423,2423,17710239
3,02/27/2022,"Cancún, Mexico",Moon Palace Resort,2423,2423,17710239
4,04/20/2022,New York City,Madison Square Garden,76470,76470,8787041
5,04/21/2022,New York City,Madison Square Garden,76470,76470,8787041
6,04/22/2022,New York City,Madison Square Garden,76470,76470,8787041
7,04/23/2022,New York City,Madison Square Garden,76470,76470,8787041
8,08/05/2022,Atlantic City,Atlantic City Beach,48577,105000,4728475
9,08/06/2022,Atlantic City,Atlantic City Beach,48577,105000,4728475


In [39]:
money_dfs_2023 = money_dfs[25]
money_dfs_2023['Gross'] = money_dfs_2023['Gross'].str.replace('$', '')

#money_dfs_2023_split = money_dfs_2023['Attendance'].str.split('/', expand=True)
#money_dfs_2023['Attendance'] = money_dfs_2023_split[0]
#money_dfs_2023['Capacity'] = money_dfs_2023_split[1]

desired_column_order = ['Date (2023)', 'City', 'Venue', 'Attendance', 'Gross']
money_dfs_2023_reordered = money_dfs_2023[desired_column_order]
money_dfs_2023 = money_dfs_2023_reordered

money_dfs_2023['City'] = money_dfs_2023['City'].str.replace(', United States', '')

def convert_date(date_str):
    try:
        date_obj = pd.to_datetime(date_str, format='%B %d', errors='coerce')
        if not pd.isnull(date_obj):
            date_obj = date_obj.replace(year=2023)
            return date_obj.strftime('%m/%d/%Y')
    except ValueError:
        pass
    return date_str
    
money_dfs_2023['Date (2023)'] = money_dfs_2023['Date (2023)'].apply(convert_date)

money_dfs_2023 = money_dfs_2023.drop(0)
money_dfs_2023 = money_dfs_2023.drop(5)
money_dfs_2023 = money_dfs_2023.drop(14)
money_dfs_2023 = money_dfs_2023.drop(38)
money_dfs_2023 = money_dfs_2023.drop(39)
money_dfs_2023 = money_dfs_2023.reset_index(drop=True)

money_dfs_2023 = money_dfs_2023.rename(columns={'Date (2023)': 'Date'})

money_dfs_2023['Capacity'] = [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None,
                              None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

money_dfs_2023 = money_dfs_2023.fillna(0)

money_dfs[25] = money_dfs_2023

money_dfs[25]

  money_dfs_2023['Gross'] = money_dfs_2023['Gross'].str.replace('$', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2023['City'] = money_dfs_2023['City'].str.replace(', United States', '')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  money_dfs_2023['Date (2023)'] = money_dfs_2023['Date (2023)'].apply(convert_date)


Unnamed: 0,Date,City,Venue,Attendance,Gross,Capacity
0,02/23/2023,Cancún,Moon Palace Resort,0,0,0
1,02/24/2023,Cancún,Moon Palace Resort,0,0,0
2,02/25/2023,Cancún,Moon Palace Resort,0,0,0
3,02/26/2023,Cancún,Moon Palace Resort,0,0,0
4,04/14/2023,Seattle,Climate Pledge Arena,0,0,0
5,04/15/2023,Seattle,Climate Pledge Arena,0,0,0
6,04/17/2023,Berkeley,William Randolph Hearst Greek Theatre,0,0,0
7,04/18/2023,Berkeley,William Randolph Hearst Greek Theatre,0,0,0
8,04/19/2023,Berkeley,William Randolph Hearst Greek Theatre,0,0,0
9,04/21/2023,Los Angeles,Hollywood Bowl,0,0,0


In [40]:
phish_concert_data = money_dfs

In [41]:
phish_concert_data[0]

Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,05/29/1993,Salinas,Laguna Seca Raceway,20000,20000,504082
1,05/30/1993,Salinas,Laguna Seca Raceway,20000,20000,504082
2,12/31/1993,Worcester,Centrum in Worcester,14581,14581,320220


## Part 3: Load
#### Loading data to a database

In [42]:
#Create directory for CSV files and loop through data to write CSV files
if not os.path.exists("output"):
    os.makedirs("output")

for i, year_data in enumerate(phish_concert_data):
    year = 1993 + i 
    df = pd.DataFrame(year_data)
    csv_filename = f'phish_{year}.csv'
    file_path = os.path.join("output", csv_filename)
    
    
    df.to_csv(file_path, index=False)

In [43]:
#Create directory to store renamed CSV files and loop through mis-named files
if not os.path.exists("output//renamed"):
    os.makedirs("output//renamed")

start_year = 2001
end_year = 2003

for year in range(start_year, end_year + 1):
    current_filename = os.path.join("output", f'phish_{year}.csv')
    new_filename = os.path.join("output//renamed", f'phish_{year + 1}.csv')
    
    os.rename(current_filename, new_filename)

In [44]:
#Repeat process of renaming files that are incorrectly named 
start_year = 2004
end_year = 2018

for year in range(start_year, end_year + 1):
    current_filename = os.path.join("output", f'phish_{year}.csv')
    new_filename = os.path.join("output//renamed", f'phish_{year + 5}.csv')
    
    os.rename(current_filename, new_filename)

In [45]:
#Transfer renamed files back to output folder
source = 'output//renamed'
destination = 'output' 

files_to_copy = os.listdir(source)

for file_name in files_to_copy:
    source_file_path = os.path.join(source, file_name)
    destination_file_path = os.path.join(destination, file_name)
    
    shutil.copy(source_file_path, destination_file_path)


In [46]:
#Write each CSV file into SQLite table
database_path = "phish.sqlite"
Path(database_path).touch()

conn = sqlite3.connect(database_path)
c = conn.cursor()

for filename in os.listdir("output"):
    if filename.endswith(".csv"):
        table_name = os.path.splitext(filename)[0]

        c.execute(f'''CREATE TABLE IF NOT EXISTS {table_name} (
                        Date DATETIME,
                        City TEXT,
                        Venue TEXT,
                        Attendance INT,
                        Capacity INT,
                        Gross INT
                     )''')

        csv_data = pd.read_csv(os.path.join("output", filename))
        csv_data.to_sql(table_name, conn, if_exists='replace', index=False)

conn.commit()
conn.close()

In [47]:
#Check table creation
database_path = "phish.sqlite"
Path(database_path).touch()

conn = sqlite3.connect(database_path)

df = pd.read_sql("SELECT * FROM phish_1993", conn)
print(df.head())

conn.close()

         Date       City                 Venue Attendance Capacity    Gross
0  05/29/1993    Salinas   Laguna Seca Raceway    20,000    20,000  504,082
1  05/30/1993    Salinas   Laguna Seca Raceway    20,000    20,000  504,082
2  12/31/1993  Worcester  Centrum in Worcester    14,581    14,581  320,220


In [48]:
#Create db in PostgreSQL 
sql_engine = create_engine('postgresql://postgres:postgres@localhost:5432/postgres')
if not sql_engine.has_table('phish_1993'):
    conn = sql_engine.connect()
    conn.execute(f"commit")
    conn.execute(f"CREATE DATABASE phish_db")
    conn.close()

  if not sql_engine.has_table('phish_1993'):


In [49]:
# Write CSV files into PostgreSQL
sql_engine = create_engine('postgresql://postgres:postgres@localhost:5432/phish_db')
for filename in os.listdir("output"):
    if filename.endswith(".csv"):
        df = pd.read_csv(f"output/{filename}")
        
        df['Attendance'] = df['Attendance'].astype(str).str.replace(',', '').astype(int)
        df['Capacity'] = df['Capacity'].astype(str).str.replace(',', '').astype(int)
        df['Gross'] = df['Gross'].astype(str).str.replace(',', '').astype(int)

        df.to_sql(
        name= os.path.splitext(filename)[0],
        con=sql_engine,
        if_exists='replace',
        index=False,
        dtype={
        "Date": DateTime,
        "City": Text,
        "Venue": Text,
        "Attendance": Float,
        "Capacity": Float,
        "Gross": Float
              }
                )

In [50]:
# Check table creation
query = "SELECT * FROM phish_1993"

df = pd.read_sql_query(query, sql_engine)
df.head()


Unnamed: 0,Date,City,Venue,Attendance,Capacity,Gross
0,1993-05-29,Salinas,Laguna Seca Raceway,20000.0,20000.0,504082.0
1,1993-05-30,Salinas,Laguna Seca Raceway,20000.0,20000.0,504082.0
2,1993-12-31,Worcester,Centrum in Worcester,14581.0,14581.0,320220.0
