# Introduction

![](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b7/Woodstock_poster.jpg/800px-Woodstock_poster.jpg)

Woodstock, the legendary music festival held from August 15 to August 18, 1969, in Bethel, New York, stands as an iconic symbol of counterculture and peace. With its mantra of "3 Days of Peace & Music," Woodstock drew hundreds of thousands of young people, artists, and activists from around the world, converging on a dairy farm for an unprecedented celebration of music, love, and unity. The event featured an eclectic lineup of groundbreaking musicians, including Jimi Hendrix, Janis Joplin, The Who, and many more, making it a cultural touchstone of the 1960s. Beyond the music, Woodstock epitomized the spirit of a generation that sought social change, advocating for civil rights, environmental consciousness, and an end to the Vietnam War, making it a symbol of hope and an enduring testament to the power of music and the human spirit.

In this notebook, we'll be importing data from the different tables located in the Woodstock Wikipedia page to create a small dataset. 






# Setup

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


In [None]:
dfs = pd.read_html("https://en.wikipedia.org/wiki/Woodstock")

# Creating the dataframe 

In [None]:
fridayToSaturday = dfs[1]

fridayToSaturday["Day"] = "Friday, August 15 – Saturday, August 16"

fridayToSaturday

In [None]:
saturdayToSunday = dfs[2]

saturdayToSunday["Day"] = "Saturday, August 16 – Sunday, August 17"

saturdayToSunday

In [None]:
sundayToMonday = dfs[3]

sundayToMonday["Day"] = "Sunday, August 17 – Monday, August 18"

sundayToMonday

In [None]:
df = pd.concat([fridayToSaturday, saturdayToSunday, sundayToMonday], ignore_index=True)

df

# Data cleaning and transformation

In [None]:
df['Time'] = df['Time'].str.replace('Midnight', '12:00 AM')

In [None]:
# Splitting the time column
df[['Start Time', 'End Time']] = df['Time'].str.split(' – ', expand=True)

# Convert starting and ending time columns to datetime objects
df['Start Time'] = pd.to_datetime(df['Start Time'], format='%I:%M %p')
df['End Time'] = pd.to_datetime(df['End Time'], format='%I:%M %p')

# Calculate concert duration
df['Concert Duration'] = df['End Time'] - df['Start Time']

# Format the datetime columns to show only the time
df['Start Time'] = df['Start Time'].dt.strftime('%I:%M %p')
df['End Time'] = df['End Time'].dt.strftime('%I:%M %p')

# Convert "Concert Duration" to string
df['Concert Duration'] = df['Concert Duration'].astype(str)

# Extract hours and minutes from "Concert Duration" and convert to a consistent format
df['Concert Duration'] = df['Concert Duration'].str.extract(r'(\d+:\d+)')

# Function to convert HH:MM to minutes
def convert_to_minutes(time_str):
    hours, minutes = map(int, time_str.split(':'))
    return hours * 60 + minutes

# Apply the function to the 'Concert Duration' column
df['Concert Duration (minutes)'] = df['Concert Duration'].apply(convert_to_minutes)

In [None]:
df

# Concert duration

In [None]:
plt.figure(figsize=(10, 8))

sns.barplot(df, y="Artist", x="Concert Duration (minutes)")

plt.title("Concerts duration at Woodstock")

# Number of concerts per day

In [None]:
numOfConcerts = df.Day.value_counts()

sns.barplot(y=numOfConcerts.index, x=numOfConcerts.values)

plt.title("Number of Concerts by Day at Woodstock")


# Exporting as CSV

In [None]:
df.to_csv("WoodstockData.csv", index=False)