# **Baby name generator**

The journey of parenthood is filled with many exciting milestones, one of the most cherished being the selection of a baby’s name. A name carries deep significance, often reflecting cultural heritage, personal preferences, and hopes for the future. However, the process of choosing the perfect name can be overwhelming, with countless options and factors to consider.

To assist parents in this important decision, we have created a Baby Name Generator. This tool is designed to simplify and enhance the name selection process.

# **Importing Libraries**

In [1]:
pip install requests beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [8]:
pip install sentence-transformers


Collecting sentence-transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Downloading sentence_transformers-3.0.1-py3-none-any.whl (227 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: sentence-transformers
Successfully installed sentence-transformers-3.0.1
Note: you may need to restart the kernel to use updated packages.


In [9]:
pip install transformers


Note: you may need to restart the kernel to use updated packages.


# **Data Scraping**

To create a robust and culturally rich Baby Name Generator, we sourced our data from the website Incompetech. This platform provided a diverse collection of names, which we specifically filtered to focus on African names from various countries.

Our goal was to curate a dataset that reflects the vast cultural heritage and linguistic diversity of Africa. For each name, we captured key attributes including:

- **Gender**: Identifying whether the name is typically used for boys, girls.

- **Meaning**: Providing the significance or symbolism behind the name, which is often rooted in the cultural and linguistic traditions of its origin.
- **Origin**: Indicating the country or cultural background from which the name originates.

This data collection ensures that parents using our Baby Name Generator have access to a wide array of names that are not only unique and meaningful but also deeply connected to African heritage.

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Base URL of the site
base_url = "https://incompetech.com/named/multi.pl"

# Define categories
categories = [
    "African - Abaluhyan", "African - Akan", "African - American",
    "African - Bantu", "African - Botswana", "African - Egyptian",
    "African - Ethiopian", "African - Ewe", "African - Fanti", "African - Ghanian",
    "African - Hausa", "African - Ibo", "African - Kenyan", "African - Kikuyu",
    "African - Lesotho", "African - Lugandan", "African - Malawian",
    "African - Musoga", "African - Nguni", "African - Nigerian", "African - Ochi",
    "African - Rukonjo", "African - Runyoro", "African - Rutooro", "African - Rwandan",
    "African - Somalian", "African - Swahili", "African - Tanzanian", "African - Ugandan",
    "African - Xhosha", "African - Yoruba", "African - Yoruban", "African - Zimbabwe",
    "African - Zulu", "African - Zuni", "Arabic"
]

def scrape_data(category, gender, start_seed=0, increment=50):
    all_data = []
    seed = start_seed
    while True:
        print(f"Fetching data for {category} ({gender}) with seed {seed}...")
        
        params = {
            'sex': gender,
            'language': category,
            'method': 'contains',
            'name': '',
            'meaning': '',
            'seed': seed,
            'action': 'search'
        }
        
        response = requests.get(base_url, params=params)
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Find the specific table containing the names
        table = soup.find('table', {'cellspacing': '2', 'bgcolor': '#ddddd'})
        
        # Check if the table exists and has rows
        if not table or len(table.find_all('tr')) <= 1:  # Only header or no data
            print(f"No more data found for {category} ({gender}) at seed {seed}. Stopping seed increment.")
            break  # No more data to fetch, stop incrementing the seed
        
        rows = table.find_all('tr')[1:]  # Skip the header row
        
        for row in rows:
            cells = row.find_all('td')
            if len(cells) >= 4:  # Ensure at least 4 columns
                name = cells[0].text.strip()
                origin = cells[1].text.strip()
                sex = cells[2].text.strip()
                meaning = cells[3].text.strip()
                variations = cells[4].text.strip() if len(cells) > 4 else ""
                all_data.append({
                    'Category': category, 
                    'Gender': gender, 
                    'Name': name, 
                    'Origin': origin, 
                    'Sex': sex, 
                    'Meaning': meaning,
                    'Variations': variations
                })
        
        # Increment seed for the next page of data
        seed += increment
    
    return all_data

def main():
    all_data = []
    
    for gender in ['Male', 'Female']:
        for category in categories:
            print(f"\nStarting to scrape {category} for {gender} names...")
            data = scrape_data(category, gender)
            
            if data:
                print(f"Successfully scraped {len(data)} records for {category} ({gender}).")
            else:
                print(f"No data found for {category} ({gender}).")
            
            all_data.extend(data)
            print(f"Total records collected so far: {len(all_data)}")
    
    # Save to CSV
    df = pd.DataFrame(all_data)
    df.to_csv('african_names.csv', index=False)
    print("Data has been scraped and saved to 'african_names.csv'")

if __name__ == "__main__":
    main()



Starting to scrape African - Abaluhyan for Male names...
Fetching data for African - Abaluhyan (Male) with seed 0...
Fetching data for African - Abaluhyan (Male) with seed 50...
No more data found for African - Abaluhyan (Male) at seed 50. Stopping seed increment.
Successfully scraped 1 records for African - Abaluhyan (Male).
Total records collected so far: 1

Starting to scrape African - Akan for Male names...
Fetching data for African - Akan (Male) with seed 0...
Fetching data for African - Akan (Male) with seed 50...
No more data found for African - Akan (Male) at seed 50. Stopping seed increment.
Successfully scraped 2 records for African - Akan (Male).
Total records collected so far: 3

Starting to scrape African - American for Male names...
Fetching data for African - American (Male) with seed 0...
Fetching data for African - American (Male) with seed 50...
No more data found for African - American (Male) at seed 50. Stopping seed increment.
Successfully scraped 7 records for Af

# **Data Preprocessing**

To ensure that our dataset for the Baby Name Generator is both accurate and user-friendly, we undertook several key preprocessing steps:

Removing Unnecessary Columns: We streamlined the dataset by eliminating columns that were not essential for the naming process. This helped to focus on the most relevant information and improve the efficiency of data handling.

Adding New Columns:

- **Name Length**: We calculated the length of each name to provide an additional parameter for users who may prefer names of specific lengths.

- **First Letter**: We extracted the first letter of each name, allowing users to filter names based on their starting letter.

- **Transforming Name Format**: To maintain consistency and enhance readability, we transformed all names into InitCap format.

This formatting capitalizes the first letter of each name while keeping the remaining letters in lowercase, making the names visually appealing and standard across the dataset.

These preprocessing steps were crucial for refining the dataset and ensuring that the Baby Name Generator delivers accurate, well-organized, and easily accessible name suggestions.

In [3]:
# Load the existing CSV file
df = pd.read_csv('african_names.csv')
df.to_csv('african_names.csv', index=False)

In [4]:
import pandas as pd

# Load the existing CSV file
df = pd.read_csv('african_names.csv')

# Drop the 'Sex' and 'Variations' columns
df = df.drop(columns=['Sex', 'Variations'])

# Save the modified DataFrame back to the original CSV file
df.to_csv('african_names.csv', index=False)
print("Removed 'Sex' and 'Variations' columns from african_names.csv.")




Removed 'Sex' and 'Variations' columns from african_names.csv.


In [5]:
print(df.head) 

<bound method NDFrame.head of                  Category  Gender     Name               Origin  \
0     African - Abaluhyan    Male   Jimiyu  African---Abaluhyan   
1          African - Akan    Male   Donkor       African---Akan   
2          African - Akan    Male   Minkah       African---Akan   
3      African - American    Male   Chikae   African---American   
4      African - American    Male  Cleavon   African---American   
...                   ...     ...      ...                  ...   
1284               Arabic  Female   Zarifa               Arabic   
1285               Arabic  Female   Zaynab               Arabic   
1286               Arabic  Female  Zubaida               Arabic   
1287               Arabic  Female  Zuleika               Arabic   
1288               Arabic  Female   Zurafa               Arabic   

                         Meaning  
0     Born during the dry season  
1                         Humble  
2                   Light haired  
3                   Power

names are in InitCap format, added 2 new columns for name length and name first letter

In [6]:
# Ensure that all names are in InitCap format (e.g., 'John' instead of 'john' or 'JOHN')
df['Name'] = df['Name'].str.title()

# Add a new column for the length of each name
df['Name Length'] = df['Name'].apply(len)

# Add a new column for the first letter of each name
df['First Letter'] = df['Name'].str[0]

# Save the modified DataFrame back to the original CSV file
df.to_csv('african_names.csv', index=False)

print("All names have been converted to InitCap format, and new columns 'Name Length' and 'First Letter' have been added.")

All names have been converted to InitCap format, and new columns 'Name Length' and 'First Letter' have been added.


In [7]:
print(df.head) 

<bound method NDFrame.head of                  Category  Gender     Name               Origin  \
0     African - Abaluhyan    Male   Jimiyu  African---Abaluhyan   
1          African - Akan    Male   Donkor       African---Akan   
2          African - Akan    Male   Minkah       African---Akan   
3      African - American    Male   Chikae   African---American   
4      African - American    Male  Cleavon   African---American   
...                   ...     ...      ...                  ...   
1284               Arabic  Female   Zarifa               Arabic   
1285               Arabic  Female   Zaynab               Arabic   
1286               Arabic  Female  Zubaida               Arabic   
1287               Arabic  Female  Zuleika               Arabic   
1288               Arabic  Female   Zurafa               Arabic   

                         Meaning  Name Length First Letter  
0     Born during the dry season            6            J  
1                         Humble           

# **Relevant word extraction**

To enhance the Baby Name Generator’s ability to filter names based on their meanings, we applied a sophisticated approach to categorize these meanings into distinct groups. Here’s an overview of the process:

**Clustering Meanings**: We utilized a zero-shot classification model to categorize the meanings of names into predefined categories. This approach allows for flexible and dynamic classification without the need for a labeled training dataset specific to our categories.

**Model Used**: facebook/bart-large-mnli from the Hugging Face Transformers library. This model is known for its ability to perform zero-shot classification, which is particularly useful for assigning meanings to categories without explicit training on those categories.

**User Input Integration**: This categorization will be integrated into the Baby Name Generator, allowing users to input their desired categories or themes. The generator will then filter and suggest names that match those categories based on the meanings.

By leveraging zero-shot classification for meaning-based clustering, we have significantly enhanced the functionality of the Baby Name Generator, providing users with a powerful tool to find names that resonate with their personal and cultural preferences.

In [10]:
import pandas as pd
from transformers import pipeline

# Ensure there are no NaN or empty strings in the 'Meaning' column
df = df[df['Meaning'].notna() & (df['Meaning'].str.strip() != '')]

# Define candidate labels for classification
candidate_labels = ['Nature', 'God', 'Intellectual Qualities', 'Virtues', 'Strength and Power', 
                    'Leadership and Royalty', 'Social Roles', 'Family relations', 'Birth Order', 
                    'Emotions', 'Objects', 'Time','Love']

# Initialize the zero-shot classifier
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# Function to classify each relevant word (sentence in this case)
def classify_sentence(sentence, threshold=0.5):
    if isinstance(sentence, str) and sentence.strip():  # Ensure sentence is a non-empty string
        result = classifier(sentence, candidate_labels)
        # Get the label with the highest score and its score
        top_label = result['labels'][0]
        top_score = result['scores'][0]
        # Check if the top score meets the threshold
        if top_score >= threshold:
            return top_label
        else:
            return 'nothing'
    return 'nothing'  # Return 'nothing' if classification fails

# Apply the classification function to the 'Meaning' column
df['Category'] = df['Meaning'].apply(lambda x: classify_sentence(x))

# Display the updated DataFrame
print(df.head(20))


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


       Category Gender      Name               Origin  \
0       nothing   Male    Jimiyu  African---Abaluhyan   
1       nothing   Male    Donkor       African---Akan   
2       nothing   Male    Minkah       African---Akan   
3       nothing   Male    Chikae   African---American   
4       nothing   Male   Cleavon   African---American   
5       nothing   Male      Elon   African---American   
6       nothing   Male      Kivi   African---American   
7       Virtues   Male  Laphonso   African---American   
8       nothing   Male     Masio   African---American   
9   Birth Order   Male  Quadrees   African---American   
10      nothing   Male    Baruti   African---Botswana   
11      nothing   Male   Fenyang   African---Botswana   
12      nothing   Male  Kefentse   African---Botswana   
13      nothing   Male    Kopano   African---Botswana   
14      nothing   Male   Montsho   African---Botswana   
15      nothing   Male      Tale   African---Botswana   
16      nothing   Male       Ta

In [11]:
print(df)

     Category  Gender     Name               Origin  \
0     nothing    Male   Jimiyu  African---Abaluhyan   
1     nothing    Male   Donkor       African---Akan   
2     nothing    Male   Minkah       African---Akan   
3     nothing    Male   Chikae   African---American   
4     nothing    Male  Cleavon   African---American   
...       ...     ...      ...                  ...   
1284  nothing  Female   Zarifa               Arabic   
1285   Nature  Female   Zaynab               Arabic   
1286   Nature  Female  Zubaida               Arabic   
1287  nothing  Female  Zuleika               Arabic   
1288  nothing  Female   Zurafa               Arabic   

                         Meaning  Name Length First Letter  
0     Born during the dry season            6            J  
1                         Humble            6            D  
2                   Light haired            6            M  
3                   Power of God            6            C  
4                          Cliff  

In [17]:
import pandas as pd

# Assuming df is your DataFrame
# Save the DataFrame as a CSV file to the Kaggle working directory
df.to_csv('/kaggle/working/names.csv', index=False)


# **Conclusion**

In summary, the Baby Name Generator has been designed to offer a personalized and intuitive naming experience for parents. Based on our comprehensive work, the generator now provides the following capabilities:

User-Defined Criteria:

**Gender**: Users can specify whether they are looking for names for boys, girls, or any gender.

**Origin**: Users can select names based on their geographical origins.

**Name Length**: Users can filter names by their length, choosing from short, medium, or long names.

**First Letter**: Users can specify the starting letter of the names they are interested in.

**Meaning**: Users can enter keywords related to the meanings they want, and the generator will use advanced classification to filter names accordingly.


Enhanced User Experience: 

With these features, the Baby Name Generator empowers parents to make informed and meaningful choices for their child’s name. It combines cultural richness with modern technology, providing a streamlined and enjoyable naming process.

Overall, the Baby Name Generator is a valuable tool for anyone seeking a name that perfectly fits their preferences and values, making the name selection process both easier and more meaningful.