## Import Libraries and Download Dependencies
This cell imports the required libraries such as random, os, glob, pandas, numpy, and datetime, as well as some libraries needed for generating lorem ipsum text (lorem) and working with natural language (nltk). It also downloads WordNet data using nltk to allow use of synonyms for word generation later.

In [18]:
import random
import os
import glob
import pandas as pd
import numpy as np
from datetime import datetime

# Import necessary libraries for text generation
import lorem
import nltk

# Download the WordNet data required by nltk
nltk.download('wordnet')

# Import WordNet from nltk.corpus to work with synonyms
from nltk.corpus import wordnet as wn


[nltk_data] Downloading package wordnet to /home/s186/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


## Generate a List of Nouns from WordNet
This cell creates a list of all nouns from WordNet using the wn.all_synsets('n') method, which filters for noun synsets.

In [19]:
noun_list = []
for synset in wn.all_synsets('n'):
    for lemma in synset.lemmas():
        noun_list.append(lemma.name())


## Load First and Last Names from a CSV File
This cell loads a list of first and last names from a CSV file into Python using pandas.



In [20]:
# A collection of first and last names collected from a CSV
result = pd.read_csv('/home/s186/Downloads/Random_names_master.csv')
first_name_list = result['first_name'].tolist()
last_name_list = result['last_name'].tolist()


## Define the DataGen Class
This cell defines a class DataGen to generate random data for books and characters. Several methods are defined for generating names, titles, descriptions, and other attributes.

In [21]:
class DataGen:
    """
    Class for generating random data such as names, titles, descriptions, etc.
    """
    def __init__(self, size):
        self._size = size
        
    def list_pick(self, choice_list, uniform=True, probability_array=None):
        """Randomly picks elements from a list, with optional probabilities."""
        result_list = []
        if uniform:
            for i in range(self._size):
                result_list.append(choice_list[random.randint(0, len(choice_list)-1)])
            return result_list
        elif not uniform and probability_array is None:
            print('Please choose a probability array for non-uniform selection')
            pass
        elif not uniform and probability_array is not None:
            result_list = np.random.choice(choice_list, self._size, p=probability_array).tolist()
            return result_list
        else:
            print('Error: Invalid input')
            pass

    def descr_gen(self):
        """Generates a list of lorem ipsum sentences."""
        result_list = []
        for i in range(self._size):
            result_list.append(lorem.sentence())
        return result_list
    
    def title_gen(self, list_of_words=noun_list):
        """Generates random titles using two randomly chosen nouns."""
        result_list = []
        for i in range(self._size):
            result_list.append(list_of_words[random.randint(0, len(list_of_words)-1)].lstrip() + " " + list_of_words[random.randint(0, len(list_of_words)-1)])
        return result_list
    
    def date_gen(self, start_year=1950, end_year=2020, set_month=None):
        """Generates random dates between two years."""
        result_list = []
        for i in range(self._size):
            year = random.choice(range(start_year, end_year))
            month = random.choice(range(1, 13)) if set_month is None else set_month
            day = random.choice(range(1, 29))
            result_list.append(f'{year}/{month}/{day}')
        return result_list
    
    def gen_random_name(self, list_1=first_name_list, list_2=last_name_list):
        """Generates random full names using two lists of first and last names."""
        result_list = []
        for i in range(self._size):
            name = f"{list_1[random.randint(0, len(list_1)-1)].rstrip()} {list_2[random.randint(0, len(list_2)-1)]}"
            result_list.append(name)
        return result_list


## Generate Book Data
This cell generates random book data (e.g., author names, publishing dates, titles, descriptions) using the DataGen class and stores the data in a pandas DataFrame.

In [22]:
Books = DataGen(10)  # Create 10 book records
book_data = pd.DataFrame({
    'author_name': Books.gen_random_name(),
    'publishing_date': Books.date_gen(2000, 2018),
    'title': Books.title_gen(),
    'descr': Books.descr_gen(),
    'book_type': Books.list_pick(['Fiction', 'Non-Fiction'], False, [0.8, 0.2]),
    'author_sex': Books.list_pick(['Male', 'Female'], False, [0.4, 0.6]),
    'genre': Books.list_pick(['Satire', 'Drama', 'Romance', 'Self help'], False, [0.4, 0.4, 0.1, 0.1]),
    'publisher': Books.list_pick(['Hachette', 'HarperCollins', 'Macmillan', 'Penguin Random'])
})


## Generate Character Data
This cell generates random character data using the DataGen class and stores the data in a pandas DataFrame.

In [23]:
character = DataGen(50)  # Create 50 character records
character_data = pd.DataFrame({
    'character_name': character.gen_random_name(),
    'character_type': character.list_pick(['human', 'alien', 'other'], False, [0.5, 0.3, 0.2])
})
character_data.head()  # Display the first few rows


Unnamed: 0,character_name,character_type
0,aubrey t coleman,human
1,pedro j jones,human
2,nathan d boyce-reid,alien
3,mark osborne,human
4,mark a carter,human


## Generate Character-Book Relationships
This cell generates random relationships between books and characters, storing the data in a pandas DataFrame.

In [24]:
Books = DataGen(10)
book_data = pd.DataFrame({
    'author_name': Books.gen_random_name(),
    'publishing_ate': Books.date_gen(2000,2018),
    'title':Books.title_gen(),
    'descr' : Books.descr_gen(),
    'book_type':Books.list_pick(['Fiction', 'Non - Fiction'],False,[0.8,0.2]),
    'author_sex':Books.list_pick(['Male', 'Female'],False,[0.4,0.6]),
    'genre':Books.list_pick(['Satire', 'Drama','Romance', 'Self help'],False,[0.4,0.4,0.1,0.1]),
    'publisher' : Books.list_pick(['Hachette', 'HarperCollins','Macmillan', 'Penguin Random'])})

In [25]:
book_data

Unnamed: 0,author_name,publishing_ate,title,descr,book_type,author_sex,genre,publisher
0,horace king,2002/11/7,hogfish yacht,Quisquam adipisci magnam dolorem magnam amet.,Fiction,Female,Drama,Macmillan
1,khalid mcdowell,2000/12/16,Rickover Hess,Dolore quisquam dolorem quisquam velit non.,Non - Fiction,Female,Drama,Macmillan
2,roosevelt gipson,2005/4/14,Ypres chuckhole,Modi dolor dolore non est modi quisquam modi.,Fiction,Male,Satire,Hachette
3,kelyn t henderson,2007/5/18,long-legs competition,Labore ut porro amet dolorem porro.,Fiction,Male,Drama,Hachette
4,vernell l railey,2012/7/26,muscle freethinking,Dolore modi porro quisquam magnam neque ut neque.,Fiction,Female,Satire,Hachette
5,jamar a morales,2016/12/3,Felis_concolor still,Amet etincidunt magnam non aliquam.,Non - Fiction,Female,Romance,Macmillan
6,kobi b jetin,2005/11/22,camouflage level,Ipsum neque porro ipsum dolor est.,Fiction,Female,Drama,HarperCollins
7,troy d petitfrere,2015/4/12,hank manoeuvrability,Modi est quiquia ipsum quisquam eius est conse...,Fiction,Female,Self help,HarperCollins
8,willie w gilbert,2011/4/18,overlayer microgauss,Ipsum labore adipisci adipisci.,Fiction,Male,Drama,HarperCollins
9,eric edouard,2003/12/15,lymphogranuloma_venereum theory,Aliquam sit dolorem magnam.,Fiction,Female,Self help,Penguin Random


In [26]:
character = DataGen(50)

character_data = pd.DataFrame({
    'character_name': character.gen_random_name(),
    'character_type': character.list_pick(['human','alien','other'],False,[0.5,0.3,0.2]) 
})
character_data.head()

Unnamed: 0,character_name,character_type
0,astley schmidt,human
1,bashon b wilkins,other
2,devonta j bellamy,human
3,emma l blocker,alien
4,daimeyon d daughtry,human


In [27]:
character_book = DataGen(100)

character_book_relationship = pd.DataFrame({
    'book':character_book.gen_random_name(book_data['title'].tolist(),['']),
    'character': character_book.gen_random_name(character_data['character_name'].tolist(),[''])})

In [28]:
character_book = DataGen(100)  # Create 100 relationships between books and characters
character_book_relationship = pd.DataFrame({
    'book': character_book.gen_random_name(book_data['title'].tolist(), ['']),
    'character': character_book.gen_random_name(character_data['character_name'].tolist(), [''])
})
character_book_relationship.head()  # Display the first few rows


Unnamed: 0,book,character
0,Rickover Hess,brittney j jackson
1,long-legs competition,john d coger
2,hogfish yacht,andrew j english
3,lymphogranuloma_venereum theory,bashon b wilkins
4,Ypres chuckhole,ramon v crittendon
