**If you lost points on the last checkpoint you can get them back by responding to TA/IA feedback**  

Update/change the relevant sections where you lost those points, make sure you respond on GitHub Issues to your TA/IA to call their attention to the changes you made here.

Please update your Timeline... no battle plan survives contact with the enemy, so make sure we understand how your plans have changed.

# COGS 108 - Data Checkpoint

# Names

- Chloe Kwan
- Anik Alam
- Anusha Rao
- Sophie Phung
- Patric Sanchez

# Research Question

Our research question is: What is the economic impact of Taylor Swift’s tour stops on the local economics of different cities? How do factors (such as venue size, average ticket price, concert attendance, etc.) impact sectors such as retail, hospitality, and the restaurant industry?

## Background and Prior Work


In 2023, singer Taylor Swift embarked on a tour of her greatest hit songs, aptly titled “The Eras Tour”. The tour spanned 53 stops throughout the United States, beginning in March and ending in August. Swift then continued touring in Latin America, Asia, and Australia into 2024, with plans to continue touring Europe and the remaining North American stops till the end of 2024.

Swift has been praised for helping boost the tourism industry at the various cities she visited on her United States tour. As fans flocked to city centers to witness her performance, they expended savings accumulated during the COVID-19 pandemic on tourism and hospitality prior to the big event. In what is dubbed as “Swiftonomics<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1),” the economic implications of the Eras Tour fascinated economists as they gathered data to understand these statistics. States reported receiving millions towards their economies, with metropolitans, such as Los Angeles, also seeing an increase in thousands of jobs to meet the demand. However, with the increased willingness to spend on tourism, there was also record inflation in certain sectors, as the prices of transportation and lodging shot up, with dwindling supply attempting to match significant demand.

Previous studies that have been done on this topic primarily consist of new articles which report increases in spending when Swift visits these cities. For example, this study<a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2) states that the Eras Tour added 5 billion dollars to the economy. Other research works<a name="cite_ref-3"></a>[<sup>3</sup>](#cite_note-3) examine the use of the words “Taylor Swift” as a marketing tool, whether or not the singer even stops in the city. For example, in San Diego, a sushi restaurant hosted a Swift-themed night the same week the singer was in Los Angeles, and immediately sold out. According to another study<a name="cite_ref-4"></a>[<sup>4</sup>](#cite_note-4), of the 53 stops in 20 cities, the average fan spent nearly $1300 on travel, hotel stays, food, and merchandise. In Pittsburgh, average room rates saw a 106% increase, with 83% of guests coming from outside the host county. We have not seen a comprehensive city-by-city analysis, and are curious as to which cities saw the greatest boosts in revenue, which sectors are most crucial to economic success, and if ticket pricing and number sold come into contention. With such a variety in demographics and areas visited, our group is interested in analyzing which areas saw the greatest growth in revenue, and where inflation hurt consumers the most.
1. <a name="https://www.investopedia.com/swiftonomics-definition-8601178"></a> [^](#cite_ref-1) Mitra, M. (8 Mar 2024) Swiftonomics: The Economic Influence of Taylor Swift. *Investopedia*. https://www.investopedia.com/swiftonomics-definition-8601178 
2. <a name="https://www.cbsnews.com/news/taylor-swift-eras-tour-boosted-economy-tourism-federal-reserve-how-much-money-made/"></a> [^](#cite_ref-2) O'Kane, C. (18 Jul 2023) The Federal Reserve says Taylor Swift's Eras Tour boosted the economy. One market research firm estimates she could add $5 billion *CBS New*. https://www.cbsnews.com/news/taylor-swift-eras-tour-boosted-economy-tourism-federal-reserve-how-much-money-made/
3. <a name="https://www.cnbc.com/2024/03/01/the-taylor-swift-economic-effect-has-reached-every-town-in-america.html"></a> [^](#cite_ref-3) Williams, K. (1 Mar 2024) The Taylor Swift economy has reached every town in America, from small business moms to skating rinks and sushi bars. *CNBC*. https://www.cnbc.com/2024/03/01/the-taylor-swift-economic-effect-has-reached-every-town-in-america.html 
4. <a name="https://www.ustravel.org/news/taylor-swift-impact-5-months-and-5-billion"></a> [^](#cite_ref-4) (19 Sept 2023) The Taylor Swift Impact – 5 Months and $5+ Billion. *U.S. Travel Association*. https://www.ustravel.org/news/taylor-swift-impact-5-months-and-5-billion 


# Hypothesis



- While it may depend on the size and economic state of the city, we hypothesize that Taylor Swift’s tour locations cause local economies to see a sharp spike; particularly within hotels, transportation, food, and other amenities due to her massive fanbase and cultural impact; LA for example might see relatively less economic impact compared to other smaller cities.
- Regardless of the city or region, a Taylor Swift concert will have an impact on any local economy, increasing the prices of most amenities including: food, hotels/shelter, and transportation; Though we believe that international concert locations will see a larger impact compared to concerts held within the United States.

Particularly because of how popular Taylor Swift is, and how large her fanbase and cultural impact is, we expect that businesses will take this into consideration if there is a tour nearby and shift their pricing for services/accomodations accordingly in order to maximize profits. This, alongside a large influx of tourists will naturally increase the demand and congestion of services as people will book hotels with as much advance as possible, visit local restaurants/fast food chains, and have to use transportation services especially if they are traveling.


# Data

## Data overview

For each dataset include the following information
- Dataset #1
  - Dataset Name: Taylor Concert Tours
  - Link to the dataset: [https://www.kaggle.com/datasets/gayu14/taylor-concert-tours-impact-on-attendance-and](https://www.kaggle.com/datasets/gayu14/taylor-concert-tours-impact-on-attendance-and)
  - Number of observations:
  - Number of variables:
- Dataset #2 (if you have more than one!)
  - Dataset Name: U.S. Airline Traffic Data (2003-2023)
  - Link to the dataset: [https://www.kaggle.com/datasets/yyxian/u-s-airline-traffic-data](https://www.kaggle.com/datasets/yyxian/u-s-airline-traffic-data)
  - Number of observations:
  - Number of variables:
- etc

Now write 2 - 5 sentences describing each dataset here. Include a short description of the important variables in the dataset; what the metrics and datatypes are, what concepts they may be proxies for. Include information about how you would need to wrangle/clean/preprocess the dataset

If you plan to use multiple datasets, add a few sentences about how you plan to combine these datasets.

## Dataset #1 (use name instead of number here)

In [25]:
## YOUR CODE TO LOAD/CLEAN/TIDY/WRANGLE THE DATA GOES HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION 
import pandas as pd
import numpy as np

In [44]:
# Reading initial Datafram
taylor_df = pd.read_csv('Taylor_Train.csv', encoding = "ISO-8859-1")
taylor_df.head()

Unnamed: 0,City,Country,Venue,Opening act(s),Attendance (tickets sold / available),Revenue,Tour
0,Evansville,United States,Roberts Municipal Stadium,Gloriana\r\nKellie Pickler,"7,463 / 7,463","$360,617",Fearless_Tour
1,Jonesboro,United States,Convocation Center,Gloriana\r\nKellie Pickler,"7,822 / 7,822","$340,328",Fearless_Tour
2,St. Louis,United States,Scottrade Center,Gloriana\r\nKellie Pickler,"13,764 / 13,764","$650,420",Fearless_Tour
3,Alexandria,United States,Bishop Ireton High School,Gloriana\r\nKellie Pickler,,,Fearless_Tour
4,North Charleston,United States,North Charleston Coliseum,Gloriana\r\nKellie Pickler,"8,751 / 8,751","$398,154",Fearless_Tour


In [45]:
# filter dataset (only reputation tour and remove opening act(s))
taylor_df = taylor_df.drop('Opening act(s)', axis=1)
eras_df = taylor_df[taylor_df['Tour'] == "Reputation_Stadium_Tour"].reset_index(drop=True)
eras_df.head()

Unnamed: 0,City,Country,Venue,Attendance (tickets sold / available),Revenue,Tour
0,Glendale,United States,University of Phoenix Stadium,"59,157 / 59,157","$7,214,478",Reputation_Stadium_Tour
1,Santa Clara,United States,Levi's Stadium,"107,550 / 107,550","$14,006,963",Reputation_Stadium_Tour
2,Santa Clara,United States,Levi's Stadium,"107,550 / 107,550","$14,006,963",Reputation_Stadium_Tour
3,Pasadena,United States,Rose Bowl,"118,084 / 118,084","$16,251,980",Reputation_Stadium_Tour
4,Pasadena,United States,Rose Bowl,"118,084 / 118,084","$16,251,980",Reputation_Stadium_Tour


## Dataset #2 (if you have more than one, use name instead of number here)

In [20]:
## YOUR CODE TO LOAD/CLEAN/TIDY/WRANGLE THE DATA GOES HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION 

# Read initial datafram
airtraffic = pd.read_csv('airtraffic.csv')
airtraffic.head()

Unnamed: 0,Year,Month,Dom_Pax,Int_Pax,Pax,Dom_Flt,Int_Flt,Flt,Dom_RPM,Int_RPM,RPM,Dom_ASM,Int_ASM,ASM,Dom_LF,Int_LF,LF
0,2003,1,43032450,4905830,47938280,785160,57667,842827,36211422,12885980,49097402,56191300,17968572,74159872,64.44,71.71,66.2
1,2003,2,41166780,4245366,45412146,690351,51259,741610,34148439,10715468,44863907,50088434,15587880,65676314,68.18,68.74,68.31
2,2003,3,49992700,5008613,55001313,797194,58926,856120,41774564,12567068,54341633,57592901,17753174,75346075,72.53,70.79,72.12
3,2003,4,47033260,4345444,51378704,766260,55005,821265,39465980,10370592,49836572,54639679,15528761,70168440,72.23,66.78,71.02
4,2003,5,49152352,4610834,53763186,789397,55265,844662,41001934,11575026,52576960,55349897,15629821,70979718,74.08,74.06,74.07


In [23]:
# Filter flight logs by Tour Time Frames

# Reputation Tour: May 2018 - Nov 2018
reputation_tour = airtraffic[(airtraffic['Year'] == 2018) & (airtraffic['Month'] >= 5) & (airtraffic['Month'] <= 11)]
reputation_tour

Unnamed: 0,Year,Month,Dom_Pax,Int_Pax,Pax,Dom_Flt,Int_Flt,Flt,Dom_RPM,Int_RPM,RPM,Dom_ASM,Int_ASM,ASM,Dom_LF,Int_LF,LF
184,2018,5,67844134,9416071,77260205,724065,73277,797342,62580568,25356444,87937012,73310971,31133051,104444022,85.36,81.45,84.2
185,2018,6,70277734,10569323,80847057,732292,77394,809686,66095829,28194872,94290701,75323742,32486010,107809753,87.75,86.79,87.46
186,2018,7,72539780,11391458,83931238,759879,81620,841499,69091745,29716690,98808435,78406927,34171666,112578593,88.12,86.96,87.77
187,2018,8,70337936,10624720,80962656,756345,77086,833431,66132022,28638849,94770872,76722785,33238079,109960864,86.2,86.16,86.19
188,2018,9,60468115,8208432,68676547,686422,65032,751454,55431352,23604682,79036033,68530468,29479449,98009918,80.89,80.07,80.64
189,2018,10,67081594,8399032,75480626,722634,66557,789191,61011629,23294709,84306338,72044924,29079416,101124339,84.69,80.11,83.37
190,2018,11,64659493,8052683,72712176,678121,64528,742649,58779361,20613643,79393004,69098147,25689087,94787234,85.07,80.24,83.76


In [36]:
# Flight logs during Eras Tour: March 2023 - August 2023
eras_tour = airtraffic[(airtraffic['Year'] == 2023) & (airtraffic['Month'] >= 3) & (airtraffic['Month'] <= 8)]
eras_tour

Unnamed: 0,Year,Month,Dom_Pax,Int_Pax,Pax,Dom_Flt,Int_Flt,Flt,Dom_RPM,Int_RPM,RPM,Dom_ASM,Int_ASM,ASM,Dom_LF,Int_LF,LF
242,2023,3,69708556,10094584,79803140,655836,71076,726912,65767207,23802743,89569950,78074142,28689319,106763461,84.24,82.97,83.9
243,2023,4,67571574,10008645,77580219,636160,70078,706238,63426143,24708597,88134739,74335449,29988666,104324115,85.32,82.39,84.48
244,2023,5,71423653,10358666,81782319,667331,71924,739255,66743565,26805432,93548998,77821407,31950687,109772094,85.77,83.9,85.22
245,2023,6,72482621,11544505,84027126,661293,75279,736572,68789127,29883465,98672591,78058358,33410671,111469028,88.13,89.44,88.52
246,2023,7,75378157,12432615,87810772,684939,79738,764677,72267904,31376000,103643904,81986010,35326191,117312202,88.15,88.82,88.35
247,2023,8,71477988,11572149,83050137,691482,77137,768619,67933484,29938507,97871992,81997399,34908793,116906192,82.85,85.76,83.72


In [37]:
# Flight logs before Eras Tour: Before March 2023
pre_eras = airtraffic[(airtraffic['Year'] == 2023) & (airtraffic['Month'] >= 1) & (airtraffic['Month'] <= 2)]
pre_eras

Unnamed: 0,Year,Month,Dom_Pax,Int_Pax,Pax,Dom_Flt,Int_Flt,Flt,Dom_RPM,Int_RPM,RPM,Dom_ASM,Int_ASM,ASM,Dom_LF,Int_LF,LF
240,2023,1,58284200,9066833,67351033,605944,66859,672803,55627743,21021245,76648988,72122699,26809510,98932209,77.13,78.41,77.48
241,2023,2,56729998,8066893,64796891,567702,60643,628345,53329402,18254936,71584338,67037368,24211736,91249104,79.55,75.4,78.45


In [38]:
# Flight logs after Eras Tour: After August 2023
post_eras = airtraffic[(airtraffic['Year'] == 2023) & (airtraffic['Month'] >= 9) & (airtraffic['Month'] <= 12)]
post_eras

Unnamed: 0,Year,Month,Dom_Pax,Int_Pax,Pax,Dom_Flt,Int_Flt,Flt,Dom_RPM,Int_RPM,RPM,Dom_ASM,Int_ASM,ASM,Dom_LF,Int_LF,LF
248,2023,9,66858490,9392985,76251475,649308,64241,713549,61777546,26076318,87853864,75748336,31231710,106980046,81.56,83.49,82.12


# Ethics & Privacy

Possible ethical implications include ensuring the informed consent of those whose data may end up being collected, such as their financial or demographic information. However, the datasets we are choosing to utilize contain no personally identifiable information, so there are limited consequences for privacy and confidentiality. There should also be ethical consideration given to the long-term consequences of analyzing and sharing our results to the public, as it may influence others’ public perceptions of Taylor Swift and the Eras Tour, potentially affecting advertising and investment decisions. This may also have ethical implications surrounding the effect on hotels and small businesses/restaurants whose finances we would analyze as part of the local economies of each city, or the disruption to daily lives that local residents face. Thus, we must ensure that the data we collect is accurate and transparent in its collection, so that biases do not lead to inaccurate conclusions that harm the well-being of businesses and communities.


# Team Expectations 


* **To respond in a timely manner and communicate work, concerns, and conflicts.**
* **To complete work in a timely manner.**
* **To contribute the expected amount as communicated with the team.**

Though work may be split differently amongst specific portions of the project, overall delegation of tasks are expected to be completed by their outline due dates. Team members are expected to contribute to the following project aspects as outlined in the Team Policies:
- “Deciding on the project topic, searching for possible datasets, and honing the data science question.
- Writing well-commented and clear code to wrangle, explore, visualize, analyze, and communicate your groups’ findings.
- Writing the accompanying text throughout the project to explain each section.
- Editing the text and code throughout your project for grammar, misspellings, and clarity.”

(Courtesy of the COGS 108 Team Policies) 

Team members are also expected to attend regular meetings which are scheduled in advance, to work individually/in pairs when needed, and to communicate any discrepancies they may encounter which may include scheduling conflicts, issues involving project, etc.

In the case a team member does not cooperate, an email will be sent informing them for the lack of work contributed, requiring them to demonstrate significant improvement within a week in the outlined tasks relay to them. If there are no actions taken, the professor will be notified.


# Project Timeline Proposal




| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 5/10  |  3 PM | Proposal submitted, begin compiling data for the dataset| Discuss possible factors(narrowing or widening the list), discuss possible analytical approaches. Assign and delegate work. | 
| 5/17  |  3 PM |  Import, Wrangle, & CLean Data. Begin EDA | Review data and cleaning processes. Discuss how to proceed with analytics. Assign work. | 
| 5/24  |  3 PM  | Finish dataset management, wrangling. Reach a midway point regarding analytics| Discuss analysis work. Continue delegating remaining work. |
| 5/31  |  3 PM  | Complete analysis; Brainstorm conclusion remarks.| Discuss results, finish analyses, write-up conclusions.|
| 6/7   |  3 PM  | Brainstorm video ideas, write out script.| Film and complete video, ensure it meets all guidelines stated by the syllabus. Complete surveys.|
| 6/12  | Before 11:59 PM  | Final project*, video*, team eval survey, post-course survey due | Turn in Final Project & Group Project Surveys |