Logistic Regression Model

In [7]:
import pandas as pd
import numpy as np
from datasets import load_dataset
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.ensemble import RandomForestClassifier

In [8]:
# load data
dataset = load_dataset("md_gender_bias", "convai2_inferred")
train = dataset['train'].to_pandas()[:50000]
test = dataset['test'].to_pandas()[:12500]

vectorizer = CountVectorizer()
trainTexts = vectorizer.fit_transform(train['text'])
testTexts = vectorizer.transform(test['text'])

xTrain = pd.DataFrame(trainTexts.toarray(), columns=vectorizer.get_feature_names_out())
xTest = pd.DataFrame(testTexts.toarray(), columns=vectorizer.get_feature_names_out())
yTrain = train['binary_label']
yTest = test['binary_label']

# train model
model = LogisticRegression()
model.fit(xTrain, yTrain)

Found cached dataset md_gender_bias (/Users/byroncuachin/.cache/huggingface/datasets/md_gender_bias/convai2_inferred/1.0.0/8ae77b51acf93383161cc954b146159291beca6c979b54ce228c46db86116c05)


  0%|          | 0/3 [00:00<?, ?it/s]

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Predicting using test dataset

In [12]:
# accuracy
pred = model.predict(xTest)
accuracy = accuracy_score(yTest, pred)
print("Accuracy:", accuracy)

# confusion matrix
cm = confusion_matrix(yTest, pred)
print("Confusion matrix:", cm)

Accuracy: 0.8034867324701961
Confusion matrix: [[2378  940]
 [ 593 3890]]


Predicting gender bias from text response

In [10]:
text = [
"""
Once upon a time in the vibrant city of Harmonyville , there lived a college student named Mia Rodriguez . Mia was a junior majoring in environmental science at Rivertide University . Her love for nature and a deep sense of responsibility towards the planet fueled her determination to make a positive impact . One sunny afternoon , Mia stumbled upon a notice about the annual Green Innovation Challenge —an event where students could pitch eco -friendly projects to a panel of environmental experts . Inspired and eager to contribute , Mia decided to develop a sustainable urban gardening initiative called  "GreenHaven . " With her hands in the soil and a heart full of passion , Mia transformed an unused corner of the campus into a thriving community garden . She envisioned GreenHaven as a place where students could come together , learn about sustainable agriculture , and cultivate their own fruits and vegetables . Mia believed that this initiative could not only promote environmental consciousness but also foster a sense of community among her peers . As the garden flourished , so did Mia 's connection with her fellow students . The project became a hub of creativity , where ideas for sustainable living blossomed alongside the vibrant array of fruits and vegetables . Mia 's dedication and leadership drew the attention of both students and faculty alike . When the day of the Green Innovation Challenge arrived , Mia nervously but proudly presented GreenHaven to the panel of judges . The vision , dedication , and positive impact of her project resonated deeply , earning her the first prize and a scholarship for further environmental studies . Word of Mia 's success spread , and GreenHaven became a symbol of sustainable living on campus . Mia 's journey didn 't end with the competition ; instead , it marked the beginning of a new chapter . With the scholarship in hand , Mia continued her studies , conducting research on innovative ways to create sustainable urban environments . As Mia graduated from Rivertide University , she left behind a legacy of green initiatives and a campus that had been transformed by the power of community and sustainability . GreenHaven continued to thrive , inspiring future generations of students to think creatively about environmental issues . Mia 's story became a beacon of hope , showing that even a single college student with a passion for change could make a lasting impact on the world . And so , as Mia embarked on her journey beyond college , she carried with her not just a degree but the knowledge that small , meaningful actions could ripple into waves of positive transformation for the planet and its people .
""",
"""
Once upon a time in the vibrant city of Harmonyville , there lived a college student . This student was a junior majoring in environmental science at Rivertide University . Their love for nature and a deep sense of responsibility towards the planet fueled their determination to make a positive impact . One sunny afternoon , the student stumbled upon a notice about the annual Green Innovation Challenge —an event where students could pitch eco -friendly projects to a panel of environmental experts . Inspired and eager to contribute , the student decided to develop a sustainable urban gardening initiative called  "GreenHaven . " With hands in the soil and a heart full of passion , the student transformed an unused corner of the campus into a thriving community garden . They envisioned GreenHaven as a place where students could come together , learn about sustainable agriculture , and cultivate their own fruits and vegetables . The student believed that this initiative could not only promote environmental consciousness but also foster a sense of community among their peers . As the garden flourished , so did the student 's connection with their fellow students . The project became a hub of creativity , where ideas for sustainable living blossomed alongside the vibrant array of fruits and vegetables . The student 's dedication and leadership drew the attention of both students and faculty alike . When the day of the Green Innovation Challenge arrived , the student nervously but proudly presented GreenHaven to the panel of judges . The vision , dedication , and positive impact of their project resonated deeply , earning them the first prize and a scholarship for further environmental studies . Word of the student 's success spread , and GreenHaven became a symbol of sustainable living on campus . The student 's journey didn 't end with the competition ; instead , it marked the beginning of a new chapter . With the scholarship in hand , the student continued their studies , conducting research on innovative ways to create sustainable urban environments . As the student graduated from Rivertide University , they left behind a legacy of green initiatives and a campus that had been transformed by the power of community and sustainability . GreenHaven continued to thrive , inspiring future generations of students to think creatively about environmental issues . The student 's story became a beacon of hope , showing that even a single college student with a passion for change could make a lasting impact on the world . And so , as the student embarked on their journey beyond college , they carried with them not just a degree but the knowledge that small , meaningful actions could ripple into waves of positive transformation for the planet and its people .
""",
"""
Once upon a time in the bustling city of Arcadia , there lived a college student named Alex Reynolds . Alex was a junior majoring in computer science at the prestigious Arcadia University . He was a diligent student with a passion for coding and a penchant for exploring the world of technology . One day , as Alex was immersed in his studies at the campus library , he stumbled upon an intriguing flyer . he he him him
""",
]
vectorizedText = vectorizer.transform(text)
textsTransformed = pd.DataFrame(vectorizedText.toarray(), columns=vectorizer.get_feature_names_out())
pred = model.predict_proba(textsTransformed)
male = []
female = []
for i in range(len(pred)):
    print(i, ":")
    print("Male: ", pred[i][1])
    print("Female: ", pred[i][0])





0 :
Male:  3.608008026407589e-21
Female:  1.0
1 :
Male:  0.9999999999981692
Female:  1.830757767606883e-12
2 :
Male:  0.9999999999999902
Female:  9.769962616701378e-15
