# Data Analysis: Objective & Subjective Features

This notebook is dedicated to performing an in-depth data analysis of the given dataset. The goal of the analysis is to explore and examine both **objective** and **subjective** features in the data. Additionally, we will identify and review **unique records** to uncover any interesting insights.

### Key Steps in the Analysis:
- **Objective Features**: Exploration of measurable data points that are quantifiable and often numerical in nature.
- **Subjective Features**: Investigation of data that relies on opinions, assessments, or qualitative descriptions.
- **Unique Records**: Identification of distinct and rare entries in the dataset that may provide additional insights or highlight any anomalies.

This analysis aims to provide a comprehensive understanding of the data and support decision-making for further model development or reporting.


## Path Creation

In [1]:
import os 
#making dynamic path
path = os.path.join(os.path.dirname(os.path.dirname(os.getcwd())), "data", "therapist data.csv")
path

'c:\\Users\\Umer\\Desktop\\Recomendation-Therapist-End-to-End-Project\\data\\therapist data.csv'

In [2]:
import pandas as pd
pd.set_option('display.max_columns', None)

data_df = pd.read_csv(path)
print(f"Names of cols: {list(data_df.columns)}, \n len of dataframe: {len(data_df)}")
data_df.head(3)

Names of cols: ['number', 'url', 'profile_title', 'profle_suffix', 'address', 'availablity', 'bio', 'license_number', 'image_url', 'fee', 'insurance', 'expetise', 'speciality', 'cities', 'countries', 'zip', 'age', 'participants', 'ethentisy', 'thrapy_Way', 'Education'], 
 len of dataframe: 3782


Unnamed: 0,number,url,profile_title,profle_suffix,address,availablity,bio,license_number,image_url,fee,insurance,expetise,speciality,cities,countries,zip,age,participants,ethentisy,thrapy_Way,Education
0,705-704-9066,https://www.psychologytoday.com/ca/therapists/...,Rebecca Crawford,Registered Psychotherapist (Qualifying),"Alliston, ON L9R",Available both in-person and online,"As a woman, it’s so SO hard to walk in those s...",Licensed by Province of Ontario / 16052,https://photos.psychologytoday.com/8ab60433-52...,"['Individual Sessions$140', 'Couple Sessions$1...",[],"['Addiction', 'Anger Management', 'Anxiety', '...","[""Women's Issues"", 'Eating Disorders', 'Domest...","['Alliston', 'Barrie', 'Midland']",[],"['L4N', 'L4R', 'L9R']","['Teen,', 'Adults,', 'Elders (65+)']","['Individuals,', 'Couples']",set(),"['Cognitive Behavioural (CBT)', 'Emotionally F...","['In Practice for 4 Years', 'AttendedAthabasca..."
1,437-800-2285,https://www.psychologytoday.com/ca/therapists/...,Imogen Tam,"Registered Social Worker,RSW(she, they)","Toronto, ON M4Y",Available online only,Having a non-judgemental space to talk to some...,Membership with Ontario College of Social Work...,https://photos.psychologytoday.com/1717c7cc-e0...,"['Individual Sessions$150', 'Pay by ACH Bank t...","['Desjardins', 'Green Shield Canada', 'Manulif...","['ADHD', 'Bisexual', 'Body Positivity', 'Caree...","['LGBTQ+', 'Anxiety', 'Navigating Multiracial/...",['Toronto'],[],"['M4Y', 'M5S']",['Adults'],['Individuals'],"['Asian,', 'Other Racial or Ethnic Background']","['Cognitive Behavioural (CBT)', 'Cultural Humi...","['Membership with OASW2022', 'Degree/Diploma f..."
2,289-278-8735,https://www.psychologytoday.com/ca/therapists/...,Shivani Dass,"Registered Social Worker,PhD,MA,MSW,RSW,CRC-rtd","Whitby, ON L1R",Available online only,"Problems are inevitable and normal in life, so...",Membership with Ontario College of Social Work...,https://photos.psychologytoday.com/001871a6-f1...,"['Individual Sessions$135', 'Couple Sessions$1...",[],"['ADHD', 'Alcohol Use', 'Anger Management', 'A...","['Anxiety', 'Addiction', 'Trauma and PTSD']","['London', 'Ottawa', 'Whitby']",[],"['K1A', 'L1R', 'N5V']","['Children (6 to 10),', 'Preteen,', 'Teen,', '...","['Individuals,', 'Couples,', 'Family']","['Indigenous Peoples,', 'Other Racial or Ethni...","['Acceptance and Commitment (ACT)', 'Art Thera...","['In Practice for 28 Years', 'Degree/Diploma f..."


In [3]:
objective_col = ["fee", "insurance" ,"expetise", "speciality",
                  "cities", "countries", "zip", "age", "participants",
                    "ethentisy","thrapy_Way", "Education"]

data_df[objective_col]

Unnamed: 0,fee,insurance,expetise,speciality,cities,countries,zip,age,participants,ethentisy,thrapy_Way,Education
0,"['Individual Sessions$140', 'Couple Sessions$1...",[],"['Addiction', 'Anger Management', 'Anxiety', '...","[""Women's Issues"", 'Eating Disorders', 'Domest...","['Alliston', 'Barrie', 'Midland']",[],"['L4N', 'L4R', 'L9R']","['Teen,', 'Adults,', 'Elders (65+)']","['Individuals,', 'Couples']",set(),"['Cognitive Behavioural (CBT)', 'Emotionally F...","['In Practice for 4 Years', 'AttendedAthabasca..."
1,"['Individual Sessions$150', 'Pay by ACH Bank t...","['Desjardins', 'Green Shield Canada', 'Manulif...","['ADHD', 'Bisexual', 'Body Positivity', 'Caree...","['LGBTQ+', 'Anxiety', 'Navigating Multiracial/...",['Toronto'],[],"['M4Y', 'M5S']",['Adults'],['Individuals'],"['Asian,', 'Other Racial or Ethnic Background']","['Cognitive Behavioural (CBT)', 'Cultural Humi...","['Membership with OASW2022', 'Degree/Diploma f..."
2,"['Individual Sessions$135', 'Couple Sessions$1...",[],"['ADHD', 'Alcohol Use', 'Anger Management', 'A...","['Anxiety', 'Addiction', 'Trauma and PTSD']","['London', 'Ottawa', 'Whitby']",[],"['K1A', 'L1R', 'N5V']","['Children (6 to 10),', 'Preteen,', 'Teen,', '...","['Individuals,', 'Couples,', 'Family']","['Indigenous Peoples,', 'Other Racial or Ethni...","['Acceptance and Commitment (ACT)', 'Art Thera...","['In Practice for 28 Years', 'Degree/Diploma f..."
3,"['Individual Sessions$150', 'Couple Sessions$1...",['Medavie Blue Cross'],"['Addiction', 'Anger Management', 'Authenticit...","['Life Transitions', 'Anxiety', 'Self Esteem']",['Whitby'],[],['L1N'],"['Children (6 to 10),', 'Preteen,', 'Teen,', '...","['Individuals,', 'Couples,', 'Group']",set(),"['Art Therapy', 'Attachment-based', 'Clinical ...","['In Practice for 13 Years', 'Certificate from..."
4,"['Individual Sessions$190', 'Pay by ACH Bank t...","['Blue Cross', 'Canada Life | Great-West Life'...","['Anger Management', 'Anxiety', 'Coping Skills...","['Trauma and PTSD', 'Dissociative Disorders (D...","['Orleans', 'Ottawa']",[],"['K1C', 'K2E']",['Adults'],['Individuals'],set(),"['Accelerated Resolution Therapy (ART)', 'Acce...","['In Practice for 19 Years', 'Certificate from..."
...,...,...,...,...,...,...,...,...,...,...,...,...
3777,"['Individual Sessions$160', 'Sliding scale: ap...","['Blue Cross', 'Medavie Blue Cross', 'Veterans...","['Addiction', 'ADHD', 'Alcohol Use', 'Anger Ma...","['Trauma and PTSD', 'Grief', 'Sports Performan...",['Kingston'],[],"['K0H', 'K7K']","['Children (6 to 10),', 'Preteen,', 'Teen,', '...","['Individuals,', 'Couples,', 'Family']",set(),"['Acceptance and Commitment (ACT)', 'Art Thera...",[]
3778,"['Individual Sessions$45', 'Pay by American Ex...",[],"['Addiction', 'Body Image & Disordered Eating'...","['Mood Disorders', 'Trauma and PTSD', 'Anxiety']","['Richmond Hill', 'Vaughan']",[],"['L4C', 'L4E', 'L4K', 'L6A']","['Teen,', 'Adults']",['Individuals'],set(),"['Attachment-based', 'Cognitive Behavioural (C...",['Certificate from Trauma Treatment Certificat...
3779,"['Individual Sessions$130', 'Sliding scale: ap...","['CVAP | Crime Victim', 'Green Shield Canada',...","['Alcohol Use', 'Anger Management', 'Anxiety',...","['Family Conflict', 'Addiction', 'Thinking Dis...",['London'],[],"['N5W', 'N6J']","['Children (6 to 10),', 'Preteen,', 'Teen,', '...","['Individuals,', 'Couples,', 'Family']",set(),"['Christian Counselling', 'Cognitive Behaviour...","['In Practice for 16 Years', 'Certificate from..."
3780,"['Individual Sessions$145', 'Pay by ACH Bank t...",[],"['ADHD', 'Alcohol Use', 'Anxiety', 'Behavioura...","['Depression', 'Chronic Illness', 'Sexual Abuse']","['Guelph', 'Kitchener', 'Waterloo']",[],"['N0B', 'N2A', 'N2L']","['Adults,', 'Elders (65+)']","['Individuals,', 'Group']",set(),"['Christian Counselling', 'Expressive Arts', '...","['In Practice for 9 Years', 'Membership with I..."


In [4]:
import ast
from pathlib import Path


class analysis:
    def __init__(self, dataframe):
        self.df = dataframe

    def objective_unique_record(self, col):
        '''
        Extract unique records from a column of lists in the DataFrame.

        This method processes a specified column in the DataFrame where each value is 
        a string representation of a list (e.g., "[item1, item2, item3]") and extracts 
        all unique elements across all rows in the column.

        Parameters:
            col (str): The name of the column in the DataFrame to process.

        Returns:
            set: A set containing all unique elements extracted from the lists in the column.

        Example:
            If the column contains:
                "[1, 2, 3]"
                "[2, 3, 4]"
            The result will be:
                {1, 2, 3, 4}
        '''
        print(f"{col}".center(50, "-"))
        unique_records = set() 
        for records in self.df[col]:
            list_ = ast.literal_eval(records)
            for value in list_:
                unique_records.add(value)
        print("Total Unique Records:", len(unique_records))
        return unique_records
    def save_in_csv(self, unique_records, filename):
        '''
        Save unique records to a CSV file in the `data/unique_records/` directory.

        Parameters:
            unique_records (set): The unique records to save.
            filename (str): The filename for the saved file.
        '''
        # Define the directory for unique records
        unique_records_dir = Path(os.path.dirname(os.path.dirname(os.getcwd())), "data", "unique_records")

        unique_records_dir.mkdir(parents=True, exist_ok=True)  # Create the directory if it doesn't exist

        # Convert the set of unique records to a DataFrame
        df = pd.DataFrame(list(unique_records), columns=["Unique Records"])

        # Save to CSV
        file_path = unique_records_dir / filename
        df.to_csv(file_path, index=False)
        print(f"Unique records saved at: {file_path}")

### Notes:

- **"What brings you to therapy?"**  
  Use **specialty** to identify therapists who specialize in the client’s issue.

- **"What are the approaches of therapy?"**  
  Use **therapy ways** and **expertise** to explain and recommend therapists based on their techniques.


In [5]:
obj = analysis(data_df)
data = obj.objective_unique_record(objective_col[2])
obj.save_in_csv(data, "expertise_unique_records")
data = obj.objective_unique_record(objective_col[3])
obj.save_in_csv(data, "speciality_unique_records")
data = obj.objective_unique_record(objective_col[7])
obj.save_in_csv(data, "age_unique_records")
data = obj.objective_unique_record(objective_col[10])
obj.save_in_csv(data, "therapy_ways_unique_records")

---------------------expetise---------------------
Total Unique Records: 1258
Unique records saved at: c:\Users\Umer\Desktop\Recomendation-Therapist-End-to-End-Project\data\unique_records\expertise_unique_records
--------------------speciality--------------------
Total Unique Records: 385
Unique records saved at: c:\Users\Umer\Desktop\Recomendation-Therapist-End-to-End-Project\data\unique_records\speciality_unique_records
-----------------------age------------------------
Total Unique Records: 10
Unique records saved at: c:\Users\Umer\Desktop\Recomendation-Therapist-End-to-End-Project\data\unique_records\age_unique_records
--------------------thrapy_Way--------------------
Total Unique Records: 899
Unique records saved at: c:\Users\Umer\Desktop\Recomendation-Therapist-End-to-End-Project\data\unique_records\therapy_ways_unique_records


In [6]:
# Function to combine data into a paragraph
def generate_paragraph(row):
    return (
        f"Profile Title: {row['profile_title']} ({row['profle_suffix']})\n"
        f"Name: {row['profile_title']}\n"
        f"Address: {row['address']} ({row['zip']})\n"
        f"Availability: {row['availablity']}\n"
        f"Bio: {row['bio']}\n"
        f"License Number: {row['license_number']}\n"
        f"Specialties: {row['speciality']}\n"
        f"Expertise: {row['expetise']}\n"
        f"Therapy Approaches: {row['thrapy_Way']}\n"
        f"Education: {row['Education']}\n"
        f"Fees: {row['fee']}\n"
        f"Insurance: {row['insurance']}\n"
        f"Cities: {row['cities']}\n"
        f"Countries: {row['countries']}\n"
        f"Target Age Groups: {row['age']}\n"
        f"Participants: {row['participants']}\n"
        f"Ethnicity: {row['ethentisy']}\n"
        f"Profile URL: {row['url']}\n"
        f"Image URL: {row['image_url']}\n"
    )

# Apply the function to each row
data_df['combine_text'] = data_df.apply(generate_paragraph, axis=1)

# Save to a new file (if needed)

# Display the resulting DataFrame
print("Record After Combining".center(50,"-"))
print(data_df[['combine_text']].loc[0, :].values[0])



--------------Record After Combining--------------
Profile Title: Rebecca Crawford (Registered Psychotherapist (Qualifying))
Name: Rebecca Crawford
Address: Alliston, ON L9R (['L4N', 'L4R', 'L9R'])
Availability: Available both in-person and online
Bio: As a woman, it’s so SO hard to walk in those shoes (or steel toe boots, flip flops, heels, whatever they are). You’ve been trying to do it all and be so strong. You’re amazing. And at the same time, you don’t have to sort through it alone: the toxic and codependent relationships, the body image pressure and never feeling like you’re enough, the huge plate load of things you take on every single day, and the sexual trauma or abuse that you carry as you try and just…live. It’s A LOT. And you deserve to find some peace and relief with it all. So what if all it takes is some therapy sessions to get that relief?
License Number: Licensed by Province of Ontario / 16052
Specialties: ["Women's Issues", 'Eating Disorders', 'Domestic Violence']
Exp

In [7]:
path_for_preprocessed = os.path.join(os.path.dirname(os.path.dirname(os.getcwd())), "data","preprocessed_with_combine_text.csv")
data_df.to_csv(path_for_preprocessed, index=False)