### Problem 1

**Generating Song Lyrics with NumPy**

**Scenario:** You are working on a RAG-based application that generates creative text formats like song lyrics. You have a dataset of existing popular song lyrics stored in a CSV file. Your task is to use NumPy to:

1. Load the song lyrics data from the CSV file.
2. Preprocess the data by converting all lyrics to lowercase and removing punctuation.
3. Use NumPy to create a random sample of 10 song lyrics from the dataset.

**Dataset :**

[**Lyrics Sample Dataset**](https://docs.google.com/spreadsheets/d/1GmUxCf1m-I94DsVMUaOsTikvh9QqFZpjmLvtNI61ylA/edit?usp=sharing)

----

#### Importing Required Libraries

In [2]:
import pandas as pd
import numpy as np

#### 1. Load the song lyrics data from the CSV file.

In [3]:
lyrics_df = pd.read_csv('./datasets/Lyrics Sample Dataset - Create one for me.csv')
lyrics_df.head()

Unnamed: 0,Lyrics
0,I wanna dance with somebody (who loves me)
1,"You're the one that I want, oohoo, honey"
2,"Baby, baby, baby, oh"
3,A sky full of stars and a heart full of scars
4,Can't stop the feeling! So excited


In [4]:
lyrics_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29 entries, 0 to 28
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Lyrics  29 non-null     object
dtypes: object(1)
memory usage: 360.0+ bytes


#### 2. Preprocess the data by converting all lyrics to lowercase and removing punctuation.

In [5]:
import string

# punctuations : ( !"#$%&'()*+,-./:;<=>?@[\]^_{|}~ )

def preprocessing_the_lyrics(lyric):
    lyric = lyric.lower() 
    translator = str.maketrans('', '', string.punctuation)
    lyric = lyric.translate(translator)
    return lyric

In [6]:
preprocessed_df = lyrics_df['Lyrics'].apply(preprocessing_the_lyrics)
print("Type of preprocessed data : ",type(preprocessed_df))

Type of preprocessed data :  <class 'pandas.core.series.Series'>


In [7]:
preprocessed_df = pd.DataFrame(lyrics_df['Lyrics'].apply(preprocessing_the_lyrics))
print("Type of processed data : ", type(preprocessed_df))
preprocessed_df.head()

Type of processed data :  <class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Lyrics
0,i wanna dance with somebody who loves me
1,youre the one that i want oohoo honey
2,baby baby baby oh
3,a sky full of stars and a heart full of scars
4,cant stop the feeling so excited


In [None]:
pd.concat([lyrics_df, preprocessed_df], axis=1).head()

Unnamed: 0,Lyrics,Lyrics.1
0,I wanna dance with somebody (who loves me),i wanna dance with somebody who loves me
1,"You're the one that I want, oohoo, honey",youre the one that i want oohoo honey
2,"Baby, baby, baby, oh",baby baby baby oh
3,A sky full of stars and a heart full of scars,a sky full of stars and a heart full of scars
4,Can't stop the feeling! So excited,cant stop the feeling so excited


#### 3. Use NumPy to create a random sample of 10 song lyrics from the dataset.

In [33]:
random_indices = np.random.choice(preprocessed_df.index, size=10, replace=False)
random_indices

array([20, 22,  9, 18, 19, 13,  7, 25, 28, 26])

In [34]:
random_sample = pd.DataFrame(preprocessed_df.loc[random_indices])

In [35]:
random_sample

Unnamed: 0,Lyrics
20,im gonna make you an offer you cant refuse
22,a thousand miles from nowhere
9,livin on a prayer
18,work work work work work work
19,hello from the other side
13,mmm mmm mmm mmm
7,dont stop believin
25,hakuna matata
28,im a barbie girl in a barbie world
26,cant buy me love
