1)Using regular expressions, extract the Capacity and Attendance counts for each concert from the unformatted text string shown above and store them in two separate Python list objects, i.e., one list containing the Capacity values as integers and one list containing the Attendance values as floats.

2)Using regular expressions, extract the names of each musical artist from the unformatted text string and store them in a Python list object. When complete, your list should contain the following entries:
["Black Stone Cherry", "Lady Gaga", "Paramore", "Rage Against the Machine", "BEYONCE"]

3)Using regular expressions, extract the Gross ticket revenue for each concert from the unformatted text string shown above and store the dollar amounts in a list of floats.

4)Create a new nested dictionary mapping the artist to the following:
1.the average ticket price for each concert based on the number of concert attendees and the gross ticket revenue. Be sure to also format this as a dollar amount (e.g., $10.15 instead of 10.1520039302).
2.a Boolean indicating whether or not the artist’s name is comprised of more than just a single word
3.how full the venue was (i.e., how much of the available capacity was filled according to the attendance). Express this value as a percentage with 2 digits of precision (e.g., 92.07%).

In [38]:
import re
from statistics import mean

class Event_Processor:
    def __init__(self, text_data):
        self.text_data = text_data

        # The extract_capacity_attend method makes use of regular expressions to take the provided text data and 
        #extract the capacity and attendance figures. Then, it generates lists of floats for attendances
        #and lists of integers for capacities,trimming any commas from attendance figures to ensure appropriate parsing."""
        
    def extract_capacity_attend(self):
      
        capacity_pattern = r"CAPACITY---:(\d+)"
        attendance_pattern = r"ATTENDANCE:(\s*[\d,]+)"
        
        capacities = [int(match.group(1)) for match in re.finditer(capacity_pattern, self.text_data)]
        attendances = [float(match.group(1).replace(',', '')) for match in re.finditer(attendance_pattern, self.text_data)]
        
        return capacities, attendances
  
    
   #In extract_musical_artist, it removes out any surrounding gaps before extracting artist names using a regular expression.
   #From the supplied text data, a list of artist names is returned.
    
    def extract_musical_artist(self):
       
        artist_pattern = r"\*{5}(.*?)---"
        artists = [match.group(1).strip() for match in re.finditer(artist_pattern, self.text_data)]
        return artists
    print("Artists:", artists)
    
    
    

    def extract_gross_ticket_aven(self):
        
        """In extract_gross_ticket_aven, a regular expression is used to extract the gross ticket revenue values from the text input.
        These revenue values are then parsed and kept as floats in a list."""
        
        revenue_pattern = r"GATE:(\s*\$[\d,]+)"
        revenues = [float(match.group(1).replace('$', '').replace(',', '')) for match in re.finditer(revenue_pattern, self.text_data)]
        return revenues
    
    
    

    def task_four(self):
        
        """In task_four, a nested dictionary is built that maps artists to their average ticket price, whether their names contain multiple words, and the proportion of venues that are filled,
        all based on concert data that has been provided."""
        
        capacities, attendances = self.extract_capacity_attend()
        artists = self.extract_musical_artist()
        revenues = self.extract_gross_ticket_aven()
        
        artist_info = {}
        for artist, capacity, attendance, revenue in zip(artists, capacities, attendances, revenues):
            avg_ticket_price = revenue / attendance
            artist_info[artist] = {
                'Average Ticket Price': f'${avg_ticket_price:.2f}',
                'Has Multiple Words': ' ' in artist,
                'Venue Fill Percentage': f'{(attendance / capacity * 100):.2f}%'
            }
        
        return artist_info
    
    
    
    # Sample text data
text_data = """
JUNE:*****Black Stone Cherry---CAPACITY---:1500 -- $ATTENDANCE: 1,315--GATE:--$28,492 ;
*****Lady Gaga ----CAPACITY---:25,000--- $ATTENDANCE: 24,368---GATE:--$461,956#;
*****Paramore ----CAPACITY---:3000 ---$ATTENDANCE: 3,000 ---GATE:-$150,000;
*****Rage Against the Machine---CAPACITY---:12000 ---$ATTENDANCE: 10,782 ---GATE: --$724,087;
*****BEYONCE---CAPACITY--:20000---$ATTENDANCE: 20,000—-GATE:$2,400,000*****
"""



# Create an instance of the Event_Processor
processor = Event_Processor(text_data)

# Perform the tasks
capacities, attendances = processor.extract_capacity_attend()
artists = processor.extract_musical_artist()
revenues = processor.extract_gross_ticket_aven()
artist_info = processor.task_four()

# Print the results
print("Capacities:", capacities)
print("Attendances:", attendances)
print("Artists:", artists)
print("Revenues:", revenues)
print("artist info:")
for artist, info in artist_info.items():
    print(artist, ":", info)



Artists: ['Black Stone Cherry', 'Lady Gaga', 'Paramore', 'Rage Against the Machine', 'BEYONCE']
Capacities: [1500, 25, 3000, 12000]
Attendances: [1315.0, 24368.0, 3000.0, 10782.0, 20000.0]
Artists: ['Black Stone Cherry', 'Lady Gaga', 'Paramore', 'Rage Against the Machine', 'BEYONCE']
Revenues: [2400000.0]
artist info:
Black Stone Cherry : {'Average Ticket Price': '$1825.10', 'Has Multiple Words': True, 'Venue Fill Percentage': '87.67%'}


5)Consider the character string “The spy was carefully disguised”. We would like to extract only the adverb ‘carefully’ from the string. To do so we write the regular expression “$*\s+ly\w+”. Explain why this fails and correct the expression.



Explanation:
The string "FIdD1E7h=" cannot be matched by the regular expression "D[a-zA-Z]*[,]=" for the following reasons:

A single non-digit character is matched by "D." 'F' is a non-digit character in 'FIdD1E7h=', but the expression calls for it to be immediately followed by 0 or more alphabet characters ([a-zA-Z]*), which is not the case.

In [34]:
text = 'FIdD1E7h='
pattern = r'\D[a-zA-Z]*[^,]='
match = re.search(pattern, text)
if match:
    print(match.group())
else:
    print("No match found")


No match found


6)Consider the character string “The spy was carefully disguised”. We would like to extract only the adverb ‘carefully’ from the string. To do so we write the regular expression “$*\s+ly\w+”. Explain why this fails and correct the expression.


Explanation:
The word "carefully" cannot be extracted from the string "The spy was carefully disguised" by the regular phrase "$*s+lyw+" for the following reasons:

"$*" corresponds to 0 or more dollar signs that are not in the input.
One or more whitespace characters are matched by "s+".
"ly" anticipates the letter 'ly' to come after the whitespace.
It is not required to extract "carefully" when "w+" matches one or more word characters.


In [29]:
text = "The spy was carefully disguised"
pattern = r'\w+ly\b'
matches = re.findall(pattern, text)
if matches:
    print(matches[0])

carefully
