# Project outlines:
0. Explenation of the project and assumptions made
1. Initialize
2. Upload the data
3. Statistics on the data
4. Merge given and enrichment data
5. Find length of 'about' section
6. Find keywords for the 'about' section
7. Use the Gemeni

# 0. Explenation of the project and assumptions made

Our project aims to develop an AI-driven tool that improves users' "about" sections by providing recommendations on length, suggesting personalized keywords, and use the Gemini LLM to make the section more appealing. Essentially, the tool enhances end-user profiles. Our research focuses on optimize the about section using the length of the "about" section, the keywords we use in the "about" section, and LLm model.

We hold few assumptions:
1. There are not "spam users" - all the users in the given LinkedIn profiles dataset are real users that try to get job.
2. The "about" section for each user has not changed over time, i.e. it is not possible that user changed his "about" section after finding a job (or alternativly has changed his "about" section after some time in which he has not found job).
3. The trend of "Job-Hopping" is active*, meaning that the users in the dataset are not necessarily looking for stability in their workplace, but jumping from one job to another.
4. No one want to hire users with nothing impressing about them that went through a lot of jobs, as that might indicate that the same users were bad employees that get fired. 
5. When we say number of jobs we mean number of elements in the *"experience"* section, as we interested in people that got job in company they was not part of, and not people that got promotion or did career change inside the company they worked in.

From assumption 1, we conclude that are statistics about the amount of jobs that people had are valid, as there are not reduntant users that bias the calculations.\
Using assumption 2, we can use assume that the "about" section is informative for our project, as it is not changed during the user "life-span" in the LinkedIn platform.\
Combining assumptions 3 and 4 let us assume that user seek to get more jobs, as the more jobs a user had the better.

*about the trend "Job-Hopping" you can see the following:
1. [Millennials: The Job-Hopping Generation](https://www.gallup.com/workplace/231587/millennials-job-hopping-generation.aspx)
2. [Article from Globes about this topic (from 2001!)](https://www.globes.co.il/news/article.aspx?did=524717)

From 2 we can conclude that our assumption is valid, as stated there that six in 10 millennials are open to new job opportunities.

# 1. Initialize

In [0]:
pip install langchain_google_genai

Python interpreter will be restarted.
Collecting langchain_google_genai
  Downloading langchain_google_genai-1.0.2-py3-none-any.whl (28 kB)
Collecting langchain-core<0.2,>=0.1.27
  Downloading langchain_core-0.1.42-py3-none-any.whl (287 kB)
Collecting google-generativeai<0.6.0,>=0.5.0
  Downloading google_generativeai-0.5.0-py3-none-any.whl (142 kB)
Collecting google-ai-generativelanguage==0.6.1
  Downloading google_ai_generativelanguage-0.6.1-py3-none-any.whl (663 kB)
Collecting google-api-python-client
  Downloading google_api_python_client-2.125.0-py2.py3-none-any.whl (12.5 MB)
Collecting google-auth>=2.15.0
  Downloading google_auth-2.29.0-py2.py3-none-any.whl (189 kB)
Collecting google-api-core
  Downloading google_api_core-2.18.0-py3-none-any.whl (138 kB)
Collecting proto-plus<2.0.0dev,>=1.22.3
  Downloading proto_plus-1.23.0-py3-none-any.whl (48 kB)
Collecting protobuf
  Downloading protobuf-4.25.3-cp37-abi3-manylinux2014_x86_64.whl (294 kB)
Collecting googleapis-common-protos<2

In [0]:
from pyspark.sql.types import *
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.functions import length
from pyspark.sql.functions import *
import pandas as pd
from pyspark.sql.functions import col, explode, collect_list, struct
import numpy as np
import matplotlib.pyplot as plt
from pyspark.ml.feature import OneHotEncoder, StringIndexer
import json

pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

spark = SparkSession.builder.getOrCreate()



# 2. Upload the data

## Uploading the given data

In [0]:
profiles = spark.read.parquet('/linkedin/people')
companies = spark.read.parquet('/linkedin/companies')

## Uploading the enrichment data

In [0]:
# !pip install kaggle
# api_token = {"username":"itayfk","key":"9ee7fc7bb855f54f85affd4302f7682e"}
# import json
# import zipfile
# import os
# with open('/root/.kaggle/kaggle.json', 'w') as file:
#     json.dump(api_token, file)
# !chmod 600 /root/.kaggle/kaggle.json
# import kaggle
# from kaggle.api.kaggle_api_extended import KaggleApi
# api = KaggleApi()
# api.authenticate()
# api.get_config_value("itayfk")
# api.data_download_file("manishkumar7432698/linkedinuserprofiles", file_name="LinkedIn company information datasets (Public web data).csv")
# !kaggle datasets download -d manishkumar7432698/linkedinuserprofiles/LinkedIn company information datasets "("Public web data")".csv
# api.dataset_download_files("manishkumar7432698/linkedinuserprofiles", "/Workspace/Users/ido.iutcu@campus.technion.ac.il", unzip=True)

In [0]:
enrich_profiles = pd.read_csv("/Workspace/Users/ido.iutcu@campus.technion.ac.il/LinkedIn people profiles datasets.csv")
# Convert enrich_profiles DataFrame to PySpark table

In [0]:
enrich_companies = pd.read_csv("/Workspace/Users/ido.iutcu@campus.technion.ac.il/LinkedIn company information datasets (Public web data).csv")
# enrich_companies.display()

# 3. Statistics on the data

### Taking a look at the data
Print the data scheme, and some important statistics about it.

In [0]:
# profiles.printSchema()

In [0]:
# Print statistics about profiles
total_num_profiles = profiles.count()
num_profiles_with = profiles.filter("about is not null").count()
num_seekjob_profiles = profiles.filter("current_company.company_id is null").count()
num_seekjob_profiles_with = profiles.filter("about is not null AND current_company.company_id is null").count()
num_seekjob_profiles_without = profiles.filter("about is null AND current_company.company_id is null").count()

# Print statistics about companies data
# print("Number of companies:", companies.count())
# print("Number of unique industries in companies:", companies.select("industries").distinct().count())
# print("Number of companies with more than 100 employees:", companies.filter("num_employees > 100").count())
# print("Number of companies founded before 2000:", companies.filter("founding_year < 2000").count())
# print("Number of companies with IPO:", companies.filter("ipo_date is not null").count())

# Print statistics about profiles data
# print("Number of profiles:", total_num_profiles)
# print("Number of profiles with an about specified:", num_profiles_with)
# print("Percentage of profiles with an about specified:", num_profiles_with / total_num_profiles)
# print("Number of profiles seeking for a job:", num_seekjob_profiles)
# print("Percentage of profiles seeking for a job:", num_seekjob_profiles / total_num_profiles)
# print("Percentage of profiles with an about specified seeking for a job:", num_seekjob_profiles_with / num_profiles_with)
# print("Percentage of profiles without an about specified seeking for a job:", num_seekjob_profiles_without / (total_num_profiles-num_profiles_with))
# print("Number of profiles with an experience specified:", profiles.filter("experience is not null").count())

We can see clearly that users with something in their *"about"* section tend to successfully hold more jobs than the ones who has nothing write in their *"about"* section.\
Lets devide the profiles table into two tables - one for users whom have "about" section (i.e. not null), and one for users that don't.

Next, we will check the average number of different jobs that user has held - one for the profiles_with_about table and one for the profiles_without_about table.\
Note that we look at the number of elements in Experience, as we are interested in how many jobs the user held in different companies, not in the same one.

In [0]:
profiles_with_about = profiles.filter("about is not null")
profiles_without_about = profiles.filter("about is null")

avg_positions_held = profiles.selectExpr("avg(size(Experience)) as avg_positions_held").collect()[0][0]
avg_positions_held_with = profiles_with_about.selectExpr("avg(size(Experience)) as avg_positions_held_with").collect()[0][0]
avg_positions_held_without = profiles_without_about.selectExpr("avg(size(Experience)) as avg_positions_held_without").collect()[0][0]

# print("Average number of different jobs that user has held:", avg_positions_held)
# print("Average number of different jobs that user with about section has held:", avg_positions_held_with)
# print("Average number of different jobs that user without about section has held:", avg_positions_held_without)

Thus, it is clear that users with nothing in their *"about"* section are expected to hold less jobs than users with something in their *"about"* section.\
Next, we will try to explore if it seems that there is any relation between the length of the about section and the number of jobs a user held.

In [0]:
# Create a list of the lengths of the "about" section
about_lengths = profiles_with_about.selectExpr("length(about) as about_length").collect()
about_lengths = [row.about_length for row in about_lengths]

# Create a list of the size(Experience)
experience_sizes = profiles_with_about.selectExpr("size(Experience) as experience_size").collect()
experience_sizes = [row.experience_size for row in experience_sizes]

# Define the bin intervals for the about lengths
bin_intervals = np.arange(0, np.max(about_lengths) + 250, 250)

# Group the experience sizes based on the intervals
grouped_experience_sizes = [np.mean([experience_sizes[j] for j in range(len(about_lengths)) if bin_intervals[i] <= about_lengths[j] < bin_intervals[i+1]]) 
                            for i in range(len(bin_intervals)-1)]

# Plotting the bar chart
# plt.figure(figsize=(10, 6))
# plt.bar(bin_intervals[:-1], grouped_experience_sizes, width=240, color='skyblue')
# plt.title('Experience Sizes vs About Lengths')
# plt.xlabel('About Lengths (in intervals of 250)')
# plt.ylabel('Average Experience Sizes')
# plt.grid(axis='y')
# plt.show()

Above we can see another interesting insight, it seems that the average number of jobs is monotone increasing as function of the length of the *"about"* section, i.e. the longer the *"about"* section, the more jobs you held in average.

Lets also check if there is a connection between the average number of jobs and the length of the 'about' section inside individual industries.

In [0]:
from pyspark.sql.functions import col

only_id_industry_companies = companies.select(col("id"), col("industries"))

# Join the profiles_with_about and companies DataFrames
profiles_with_about_ind = profiles_with_about.join(only_id_industry_companies, profiles_with_about["current_company.company_id"] == only_id_industry_companies["id"], "left")
# profiles_with_about_ind.display()

In [0]:
profiles_with_about_ind_extract = profiles_with_about_ind.select(col("about"), col("industries"))
profiles_with_about_ind_extract = profiles_with_about_ind_extract.filter("industries is not null")

profiles_with_about_ind_exp_extract = profiles_with_about_ind.select(col("about"), col("industries"), col("experience"))
profiles_with_about_ind_exp_extract = profiles_with_about_ind_exp_extract.filter("industries is not null")

# Group by industries and count
top10_industries = profiles_with_about_ind_extract.groupBy("industries").count()

# Sort by count in descending order
top10_industries = top10_industries.orderBy("count", ascending=False)

# Select top 10 industries
top10_industries = top10_industries.limit(10)

# Display the result
# top10_industries.display()

In [0]:
# Create a list of top 10 industries
top10_industries_list = top10_industries.select("industries").rdd.flatMap(lambda x: x).collect()

# Filter rows in profiles_with_about_ind_extract where industry is in top 10 industries list
filtered_profiles = profiles_with_about_ind_extract.filter(profiles_with_about_ind_extract.industries.isin(top10_industries_list))

# Convert the "about" column into its length
filtered_profiles = filtered_profiles.withColumn("about_length", length(filtered_profiles.about))
filtered_profiles = filtered_profiles.select(col("about_length"), col("industries"))

# Group the data by the "industry" column and calculate the average length of the "about" section
avg_about_length_by_industry = filtered_profiles.groupby("industries").agg({"about_length": "avg"})

# Convert the DataFrame to Pandas for easier plotting
avg_about_length_by_industry_pd = avg_about_length_by_industry.toPandas()

# Plotting the bar chart
# plt.figure(figsize=(15, 9))
# plt.bar(avg_about_length_by_industry_pd["industries"], avg_about_length_by_industry_pd["avg(about_length)"], color='skyblue')
# plt.title('Average Length of About Section by Industry')
# plt.xlabel('Industry')
# plt.ylabel('Average Length of About Section')
# plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
# plt.grid(axis='y')  # Add gridlines only along the y-axis
# plt.tight_layout()  # Adjust layout to prevent clipping of labels
# plt.show()

From the bar plot above we can conclude that trying to make your *"about"* section longer is not always the right choice, and therefore out task of finding a custom length for each user in LinkedIn is indeed real task, with no simple solution such as to make the *"about"* section as long as you can.  

Next, we will look at the experience sizes vs about lengths per industry

In [0]:
def hist_by_ind(industry):
    # Select all rows where the industry is Real Estate from profiles_with_about_ind_exp_extract
    selected_industry_profiles = profiles_with_about_ind_exp_extract.filter(profiles_with_about_ind_exp_extract.industries == industry)

    # Create a list of the lengths of the "about" section
    about_lengths = selected_industry_profiles.selectExpr("length(about) as about_length").collect()
    about_lengths = [row.about_length for row in about_lengths]

    # Create a list of the size(Experience)
    experience_sizes = selected_industry_profiles.selectExpr("size(Experience) as experience_size").collect()
    experience_sizes = [row.experience_size for row in experience_sizes]

    # Define the bin intervals for the about lengths
    bin_intervals = np.arange(0, np.max(about_lengths) + 250, 250)

    # Group the experience sizes based on the intervals
    grouped_experience_sizes = [np.mean([experience_sizes[j] for j in range(len(about_lengths)) if bin_intervals[i] <= about_lengths[j] < bin_intervals[i+1]]) 
                                for i in range(len(bin_intervals)-1)]

    # Plotting the bar chart
    plt.figure(figsize=(10, 6))
    plt.bar(bin_intervals[:-1], grouped_experience_sizes, width=240, color='skyblue')
    plt.title(f'Experience Sizes vs About Lengths in industry {industry}')
    plt.xlabel('About Lengths (in intervals of 250)')
    plt.ylabel('Average Experience Sizes')
    plt.grid(axis='y')
    plt.show()

In [0]:
# hist_by_ind("Real Estate")

In [0]:
# hist_by_ind("Higher Education")

In [0]:
# hist_by_ind("Hospitals and Health Care")

In [0]:
# hist_by_ind("IT Services and IT Consulting")

In [0]:
# hist_by_ind("Retail")

As we can see from the 5 histograms above, for the each of the largest 5 industries there is different length of 'about' section that is the best.

Therefore, it is indeed good idea to try and find the optimal length of the 'about' section for each profile. 

# 4. Merge given and enrichment data

## 4.1 Extract the features for the Regression Tree model

## 4.1.1 From given data

In [0]:
only_id_industry_companies = companies.select(col("id"), col("industries"))

# Join the profiles_with_about and companies DataFrames
profiles_with_about_ind = profiles_with_about.join(only_id_industry_companies, profiles_with_about["current_company.company_id"] == only_id_industry_companies["id"], "left")

profiles_for_fakti = profiles_with_about_ind.select(col("about"), col("education"), col("certifications"), col("volunteer_experience"))

## 4.1.2 From enrichment data

In [0]:
enrich_profiles = enrich_profiles[["name", "about", "experience", "education", "certifications", "recommendations", "volunteer_experience"]]

# Convert pandas dataframe to pyspark dataframe
enrich_profiles_spark = spark.createDataFrame(enrich_profiles)

# Remove rows with null values
enrich_profiles_spark = enrich_profiles_spark.dropna()

experience_struct_type = StructType([
    StructField("company", StringType()),
    StructField("company_id", StringType()),
    StructField("industry", StringType()),
    StructField("location", StringType()),
    StructField("positions", ArrayType(StructType([
        StructField("description", StringType()),
        StructField("duration", StringType()),
        StructField("duration_short", StringType()),
        StructField("end_date", StringType()),
        StructField("start_date", StringType()),
        StructField("subtitle", StringType()),
        StructField("title", StringType())
    ]))),
    StructField("url", StringType())
])

education_struct_type = StructType([
    StructField("degree", StringType()),
    StructField("end_year", StringType()),
    StructField("field", StringType(), nullable=True),
    StructField("meta", StringType()),
    StructField("start_year", StringType()),
    StructField("title", StringType()),
    StructField("url", StringType())
])

certifications_struct_type = StructType([
    StructField("meta", StringType()),
    StructField("subtitle", StringType()),
    StructField("title", StringType())
])

# certifications_struct_type = StructType([
#     StructField("meta", StringType()),
#     StructField("subtitle", StringType()),
#     StructField("title", StringType())
# ])

volunteer_experience_struct_type = StructType([
    StructField("cause", StringType()),
    StructField("duration", StringType()),
    StructField("duration_short", StringType()),
    StructField("end_date", StringType()),
    StructField("info", StringType()),
    StructField("start_date", StringType()),
    StructField("subtitle", StringType()),
    StructField("title", StringType())
])

# Define a UDF to apply json.loads
def parse_json(s):
    return json.loads(s)

# Register the UDF
experience_parse_json_udf = udf(parse_json,ArrayType(experience_struct_type))
# Apply the UDF to the column
enrich_profiles_spark = enrich_profiles_spark.withColumn("experience_", experience_parse_json_udf(col("experience")))

# Register the UDF
certifications_parse_json_udf = udf(parse_json,ArrayType(certifications_struct_type))
# Apply the UDF to the column
enrich_profiles_spark = enrich_profiles_spark.withColumn("certifications_", certifications_parse_json_udf(col("certifications")))

# Register the UDF
education_parse_json_udf = udf(parse_json,ArrayType(education_struct_type))
# Apply the UDF to the column
enrich_profiles_spark = enrich_profiles_spark.withColumn("education_", education_parse_json_udf(col("education")))

# # Register the UDF
# recommendations_parse_json_udf = udf(parse_json,ArrayType(recommendations_struct_type))
# # Apply the UDF to the column
# enrich_profiles_spark = enrich_profiles_spark.withColumn("recommendations_", recommendations_parse_json_udf(col("recommendations")))

# Register the UDF
volunteer_experience_parse_json_udf = udf(parse_json,ArrayType(volunteer_experience_struct_type))
# Apply the UDF to the column
enrich_profiles_spark = enrich_profiles_spark.withColumn("volunteer_experience_", volunteer_experience_parse_json_udf(col("volunteer_experience")))

enrich_profiles_spark = enrich_profiles_spark.drop("experience", "experience_", "education", "certifications", "recommendations", "volunteer_experience")

enrich_profiles_spark = enrich_profiles_spark \
    .withColumnRenamed("education_", "education") \
    .withColumnRenamed("certifications_", "certifications") \
    .withColumnRenamed("volunteer_experience_", "volunteer_experience")

enrich_profiles_spark = enrich_profiles_spark.select(col("about"), col("education"), col("certifications"), col("volunteer_experience"))

## 4.2 Merge

In [0]:
merge_profiles1 = profiles_for_fakti
merge_profiles2 = enrich_profiles_spark

merged_profiles = merge_profiles1.union(merge_profiles2)
features_for_rt = merged_profiles
# features_for_rt.display()

## 4.3 Remove empty elements 

In [0]:
from pyspark.sql.functions import size

features_for_rt = features_for_rt.where(size(features_for_rt.education) > 0)
features_for_rt = features_for_rt.where(size(features_for_rt.certifications) > 0)
features_for_rt = features_for_rt.where(size(features_for_rt.volunteer_experience) > 0)

# features_for_rt.display()

# 5. Find length of 'about' section

We will find the optimal length of 'about' section using Regression Tree model

## 5.1 First try - Regression Tree

### Training Tree

In [0]:
def get_subtitle_str(arr):
    return ' '.join([d["subtitle"] if isinstance(d["subtitle"], str) else "" for d in arr]) + " |"

def get_title_str(arr):
    return ' '.join([d["title"] if isinstance(d["title"], str) else "" for d in arr]) + " |"

def get_num_of_words(string):
    return len(list(string.split(" ")))

udf_dict_to_string_subtitle = udf(get_subtitle_str, StringType())
udf_dict_to_string_title = udf(get_title_str, StringType())
udf_about_num_of_words =  udf(get_num_of_words, IntegerType())
# udf_dict_to_string_education = udf(education_dict_to_str, StringType())
df_fakter = features_for_rt.withColumn("certificationString", udf_dict_to_string_subtitle(features_for_rt["certifications"]))
# df_fakter = df_fakter.withColumn("educationString", udf_dict_to_string_education(df_fakter["education"]))
df_fakter = df_fakter.withColumn("volunteerString", udf_dict_to_string_title(df_fakter[
"volunteer_experience"]))
columns_to_concat = ["certificationString", "volunteerString"]
df_concatenated = df_fakter.withColumn("all_text", F.concat_ws(" ", *[F.when(F.col(c).isNull(), F.lit("none")).otherwise(F.col(c)) for c in columns_to_concat]))
df_just_txt = df_concatenated.select("all_text", "about")
df_just_txt = df_just_txt.withColumn("target", udf_about_num_of_words(df_just_txt["about"]))


In [0]:
from pyspark.ml.feature import StringIndexer, VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.feature import IndexToString
from pyspark.ml import Pipeline
from sparknlp.annotator import BertSentenceEmbeddings
from sparknlp.base import DocumentAssembler
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml.classification import RandomForestClassifier, LogisticRegression
from pyspark.ml.linalg import Vectors
from pyspark.ml.linalg import VectorUDT
from pyspark.sql.types import *
from pyspark.ml.feature import VectorAssembler
documentAssembler_4 = DocumentAssembler().setInputCol("all_text").setOutputCol("doc")
sentence_4 = SentenceDetector() \
    .setInputCols(["doc"]) \
    .setOutputCol("sentence") \
    .setCustomBounds(["^KOPER^"]) \
    .setUseCustomBoundsOnly(True)
embeddings_4 = BertSentenceEmbeddings.pretrained("sent_small_bert_L2_128", "en") \
    .setInputCols(["sentence"]) \
    .setOutputCol("bert_embeddings")
embeddingsFinisher_4 = EmbeddingsFinisher() \
    .setInputCols(["bert_embeddings"]) \
    .setOutputCols("finished_embeddings") \
    .setOutputAsVector(True)
pipeline_4 = Pipeline().setStages([documentAssembler_4, sentence_4 ,embeddings_4,embeddingsFinisher_4]).fit(df_just_txt)
df_tokenized = pipeline_4.transform(df_just_txt)
df_tokenized_toModel = df_tokenized.select("target", "finished_embeddings")

# Define a UDF to extract "values" property from each element
def extract_values(embeddings):
    return embeddings[0].toArray().tolist()

# Register the UDF
extract_values_udf = udf(extract_values, ArrayType(DoubleType()))
df_ready = df_tokenized_toModel.withColumn("values", extract_values_udf(df_tokenized_toModel["finished_embeddings"]))
df_ready = df_ready.select(["target", "values"])
list_to_vector_udf = udf(lambda l: Vectors.dense(l), VectorUDT())
df_ready = df_ready.withColumn("features", list_to_vector_udf("values"))


sent_small_bert_L2_128 download started this may take some time.
Approximate size to download 16.1 MB
[ | ][ / ][ — ][ \ ][ | ][OK!]


In [0]:

from pyspark.ml.regression import DecisionTreeRegressor
from pyspark.ml.evaluation import RegressionEvaluator
train_set, test_set = df_ready.randomSplit([0.8,0.2], seed=42)
rt = DecisionTreeRegressor(featuresCol="features", labelCol="target", maxDepth=16)
rtModel = rt.fit(train_set)
train_preds = rtModel.transform(train_set)
evaluator_train = RegressionEvaluator(labelCol="target", predictionCol="prediction")
print("RMSE train:")
print(evaluator_train.evaluate(train_preds))
evaluator_test = RegressionEvaluator(labelCol="target", predictionCol="prediction")
test_preds = rtModel.transform(test_set)
print("RMSE test:")
print(evaluator_test.evaluate(test_preds))


RMSE train:
55.1913282546854
RMSE test:
100.2113530333003


## 5.2 Second try - Meta Industries
We will try and use the mean length of each meta industry

In [0]:
meta_industries_12 = {
    'Furniture and Home Furnishings Manufacturing': 'Manufacturing',
    'Investment Banking': 'Financial and Investment',
    'Architecture and Planning': 'Services',
    'Wholesale': 'Services',
    'Travel Arrangements': 'Services',
    'Ranching': 'Miscellaneous',
    'Hospitals and Health Care': 'Healthcare and Medical',
    'Book and Periodical Publishing': 'Services',
    'Printing Services': 'Services',
    'Professional Training and Coaching': 'Services',
    'Computers and Electronics Manufacturing': 'Manufacturing',
    'Shipbuilding': 'Manufacturing',
    'Public Policy Offices': 'Government and Public Policy',
    'Software Development': 'Technology',
    'Outsourcing and Offshoring Consulting': 'Services',
    'Retail Groceries': 'Retail and Consumer Goods',
    'Education Administration Programs': 'Education and Training',
    'Plastics Manufacturing': 'Manufacturing',
    'Renewable Energy Semiconductor Manufacturing': 'Manufacturing',
    'Computer Networking Products': 'Technology',
    'Events Services': 'Services',
    'Information Services': 'Services',
    'Food and Beverage Services': 'Services',
    'Semiconductor Manufacturing': 'Manufacturing',
    'Business Consulting and Services': 'Services',
    'Insurance': 'Services',
    'Financial Services': 'Services',
    'Wireless Services': 'Services',
    'Computer Hardware Manufacturing': 'Technology',
    'Public Safety': 'Services',
    'Maritime Transportation': 'Transportation and Logistics',
    'Tobacco Manufacturing': 'Manufacturing',
    'Writing and Editing': 'Services',
    'Veterinary Services': 'Services',
    'Staffing and Recruiting': 'Services',
    'Accounting': 'Services',
    'International Affairs': 'Government and Public Policy',
    'Spectator Sports': 'Miscellaneous',
    'Glass, Ceramics and Concrete Manufacturing': 'Manufacturing',
    'Chemical Manufacturing': 'Manufacturing',
    'Mining': 'Miscellaneous',
    'E-Learning Providers': 'Technology',
    'Security and Investigations': 'Services',
    'Translation and Localization': 'Services',
    'Automation Machinery Manufacturing': 'Technology',
    'Computer and Network Security': 'Technology',
    'Political Organizations': 'Government and Public Policy',
    'Environmental Services': 'Government and Public Policy',
    'Oil and Gas': 'Miscellaneous',
    'Real Estate': 'Real Estate and Construction',
    'Think Tanks': 'Government and Public Policy',
    'Executive Offices': 'Miscellaneous',
    'Law Practice': 'Services',
    'Nanotechnology Research': 'Miscellaneous',
    'International Trade and Development': 'Government and Public Policy',
    'Personal Care Product Manufacturing': 'Manufacturing',
    'Philanthropic Fundraising Services': 'Services',
    'Entertainment Providers': 'Media and Entertainment',
    'Market Research': 'Media and Entertainment',
    'Movies, Videos, and Sound': 'Media and Entertainment',
    'Sporting Goods Manufacturing': 'Manufacturing',
    'Graphic Design': 'Services',
    'Technology, Information and Internet': 'Technology',
    'IT Services and IT Consulting': 'Technology',
    'Retail Office Equipment': 'Retail and Consumer Goods',
    'Wholesale Import and Export': 'Services',
    'Capital Markets': 'Financial and Investment',
    'Law Enforcement': 'Services',
    'Freight and Package Transportation': 'Transportation and Logistics',
    'Industrial Machinery Manufacturing': 'Manufacturing',
    'Non-profit Organizations': 'Miscellaneous',
    'Retail Art Supplies': 'Retail and Consumer Goods',
    'Animation and Post-production': 'Media and Entertainment',
    'Transportation, Logistics, Supply Chain and Storage': 'Transportation and Logistics',
    'Aviation and Aerospace Component Manufacturing': 'Transportation and Logistics',
    'Fundraising': 'Financial and Investment',
    'Railroad Equipment Manufacturing': 'Transportation and Logistics',
    'Construction': 'Real Estate and Construction',
    'Investment Management': 'Financial and Investment',
    'Utilities': 'Miscellaneous',
    'Retail Luxury Goods and Jewelry': 'Retail and Consumer Goods',
    'Warehousing and Storage': 'Transportation and Logistics',
    'Media Production': 'Media and Entertainment',
    'Gambling Facilities and Casinos': 'Media and Entertainment',
    'Defense and Space Manufacturing': 'Manufacturing',
    'Facilities Services': 'Services',
    'Government Relations Services': 'Government and Public Policy',
    'Advertising Services': 'Media and Entertainment',
    'Paper and Forest Product Manufacturing': 'Manufacturing',
    'Packaging and Containers Manufacturing': 'Manufacturing',
    'Telecommunications': 'Technology',
    'Medical Equipment Manufacturing': 'Healthcare and Medical',
    'Beverage Manufacturing': 'Manufacturing',
    'Restaurants': 'Retail and Consumer Goods',
    'Leasing Non-residential Real Estate': 'Real Estate and Construction',
    'Newspaper Publishing': 'Media and Entertainment',
    'Armed Forces': 'Miscellaneous',
    'Appliances, Electrical, and Electronics Manufacturing': 'Manufacturing',
    'Hospitality': 'Services',
    'Pharmaceutical Manufacturing': 'Healthcare and Medical',
    'Research Services': 'Services',
    'Retail Apparel and Fashion': 'Retail and Consumer Goods',
    'Photography': 'Media and Entertainment',
    'Wellness and Fitness Services': 'Services',
    'Truck Transportation': 'Transportation and Logistics',
    'Consumer Services': 'Services',
    'Wholesale Building Materials': 'Services',
    'Human Resources Services': 'Services',
    'Airlines and Aviation': 'Transportation and Logistics',
    'Machinery Manufacturing': 'Manufacturing',
    'Individual and Family Services': 'Services',
    'Motor Vehicle Manufacturing': 'Manufacturing',
    'Performing Arts': 'Media and Entertainment',
    'Museums, Historical Sites, and Zoos': 'Media and Entertainment',
    'Broadcast Media Production and Distribution': 'Media and Entertainment',
    'Banking': 'Financial and Investment',
    'Recreational Facilities': 'Miscellaneous',
    'Government Administration': 'Government and Public Policy',
    'Public Relations and Communications Services': 'Media and Entertainment',
    'Fisheries': 'Miscellaneous',
    'Medical Practices': 'Healthcare and Medical',
    'Religious Institutions': 'Miscellaneous',
    'Online Audio and Video Media': 'Media and Entertainment',
    'Artists and Writers': 'Miscellaneous',
    'Biotechnology Research': 'Healthcare and Medical',
    'Legal Services': 'Services',
    'Retail': 'Retail and Consumer Goods',
    'Civil Engineering': 'Services',
    'Libraries': 'Miscellaneous',
    'Alternative Dispute Resolution': 'Miscellaneous',
    'Manufacturing': 'Miscellaneous',
    'Design Services': 'Services',
    'Dairy Product Manufacturing': 'Manufacturing',
    'Higher Education': 'Education and Training',
    'Civic and Social Organizations': 'Miscellaneous',
    'Textile Manufacturing': 'Manufacturing',
    'Venture Capital and Private Equity Principals': 'Financial and Investment',
    'Mental Health Care': 'Healthcare and Medical',
    'Musicians': 'Media and Entertainment',
    'Farming': 'Miscellaneous',
    'Computer Games': 'Media and Entertainment',
    'Strategic Management Services': 'Services',
    'Food and Beverage Manufacturing': 'Manufacturing',
    'Primary and Secondary Education': 'Education and Training',
    'Alternative Medicine': 'Healthcare and Medical',
    'Legislative Offices': 'Services',
    'Administration of Justice': 'Services',
    'Mobile Gaming Apps': 'Media and Entertainment'
}

meta_industry = udf( lambda x: meta_industries_12[x] )
profiles_with_about_metaind = profiles_with_about_ind.filter(companies.industries.isNotNull())
profiles_with_about_metaind = profiles_with_about_metaind.withColumn('meta_industry', meta_industry(col('industries')))


def get_num_of_words(string):
    return len(list(string.split(" ")))

udf_about_num_of_words =  udf(get_num_of_words, IntegerType())

profiles_with_about_metaind = profiles_with_about_metaind.withColumn("target", udf_about_num_of_words(profiles_with_about_metaind["about"]))

In [0]:
profiles_with_about_metaind_select = profiles_with_about_metaind.select('about', 'meta_industry', 'target')

In [0]:
# Calculate the mean length of the about column for each meta_industry
about_length_avg = profiles_with_about_metaind_select.groupBy('meta_industry').agg(avg('target').alias('avg_words_in_about'))
# about_length_avg = about_length_avg.join(label_translator, about_length_avg['meta_industry'] == label_translator['meta_industry'])
about_length_avg = about_length_avg.select('meta_industry', 'avg_words_in_about')

# Convert about_length_avg into dictionary
about_length_dict = about_length_avg.rdd.collectAsMap()
about_length_dict = {k: int(v) for k, v in about_length_dict.items()}

# Print the dictionary
# about_length_dict

In [0]:
about_length_dict_ind = [
    77,
    82,
    71,
    70,
    71,
    75,
    82,
    83,
    86,
    88,
    73,
    77 ]

label_translator = {
    "Miscellaneous": 0,
    "Services": 1,
    "Transportation and Logistics": 2,
    "Retail and Consumer Goods": 3,
    "Healthcare and Medical": 4,
    "Government and Public Policy": 5,
    "Education and Training": 6,
    "Technology": 7,
    "Real Estate and Construction": 8,
    "Media and Entertainment": 9,
    "Manufacturing": 10,
    "Financial and Investment": 11
}

In [0]:
from pyspark.sql.functions import col
import numpy as np

profiles_with_about_metaind_predicted = profiles_with_about_metaind.select('meta_industry', 'target')

lab2idx = udf(lambda x: label_translator[x],IntegerType())
profiles_with_about_metaind_predicted = profiles_with_about_metaind_predicted.withColumn("meta_industry", lab2idx(profiles_with_about_metaind_predicted["meta_industry"]))

# profiles_with_about_metaind_predicted = profiles_with_about_metaind_predicted.withColumn("predicted_target",MyUDFs.trans(about_length_dict_ind)(profiles_with_about_metaind_predicted["meta_industry"],profiles_with_about_metaind_predicted["target"]))

# # Add column named predicted_target based on the value in about_length_dict
# # profiles_with_about_metaind_predicted = profiles_with_about_metaind_predicted.withColumn('predicted_target', about_length_dict[col('meta_industry').cast('string')])

# profiles_with_about_metaind_predicted.display()

# Convert the 'target' column of profiles_with_about_metaind_predicted to a numpy array
np_target = np.array(profiles_with_about_metaind_predicted.select('target').collect())

# Convert the 'meta_industry' column of profiles_with_about_metaind_predicted to a numpy array
np_meta_industry = np.array(profiles_with_about_metaind_predicted.select('meta_industry').collect())

In [0]:
import numpy as np

# Convert the 'target' column of profiles_with_about_metaind_predicted to a numpy array
np_target = np.array(profiles_with_about_metaind_predicted.select('target').collect())

# Convert the 'meta_industry' column of profiles_with_about_metaind_predicted to a numpy array
np_meta_industry = np.array(profiles_with_about_metaind_predicted.select('meta_industry').collect())

In [0]:
from sklearn.metrics import mean_squared_error

# Create an empty list to store the predicted values
predicted_values = []

# Loop through each element in np_meta_industry
for meta_industry in np_meta_industry:
    # Use the about_length_dict to predict the target value
    predicted_value = about_length_dict_ind[meta_industry[0]]
    predicted_values.append(predicted_value)

# Calculate the RMSE using the predicted values and np_target
rmse = np.sqrt(mean_squared_error(np_target, predicted_values))

# Print the RMSE
print("RMSE:", rmse)

RMSE: 71.50609421921602


We see that the last method is working better and acheiving superior results than the Regression Tree method.

# 6. Find keywords for the 'about' section 

In [0]:
df_llm = profiles_with_about_metaind.select("name","about",explode("experience").alias("experience_element"), "meta_industry")
df_llm = df_llm.withColumn("length", (length(col("experience_element.description")) / 2).cast("int"))

def split_description(description, half_length):
    if half_length is not None:
        return description[:half_length]
    return description
split_description_udf = F.udf(split_description, StringType())
df_llm = df_llm.withColumn("description", split_description_udf(col("experience_element.description"), col("length"))).select("name","about","description", "meta_industry")
      

In [0]:

key_prof = df_llm.withColumnRenamed("description","text")

# Import the required modules and classes
from sparknlp.base import DocumentAssembler, Pipeline
from sparknlp.annotator import (
    SentenceDetector,
    Tokenizer,
    YakeKeywordExtraction
)


# Step 1: Transforms raw texts to `document` annotation
document = DocumentAssembler() \
            .setInputCol("text") \
            .setOutputCol("document")
# Step 2: Sentence Detection
sentenceDetector = SentenceDetector() \
            .setInputCols("document") \
            .setOutputCol("sentence")
# Step 3: Tokenization
token = Tokenizer() \
            .setInputCols("sentence") \
            .setOutputCol("token") \
            .setContextChars(["(", ")", "?", "!", ".", ","])
# Step 4: Keyword Extraction
keywords = YakeKeywordExtraction() \
            .setInputCols("token") \
            .setOutputCol("keywords") \
            .setWindowSize(10) \
            .setNKeywords(5)
            


# Define the pipeline
pipeline = Pipeline(stages=[document, sentenceDetector, token, keywords])

# Apply the pipeline on 'profiles' DataFrame
profiles_with_keywords = pipeline.fit(key_prof).transform(key_prof)

# Show the resulting DataFrame
profiles_with_keywords = profiles_with_keywords.select(col("text").alias("description"), col("keywords.result").alias("description_keywords"), "keywords.metadata")

profiles_with_keywords.limit(30).display()

description,description_keywords,metadata
,List(),List()
,List(),List()
,List(),List()
,List(),List()
"Program and operations manager for all TMA services, primarily focused on EZRide Shuttle (fixed route commuter shuttle linking commuter rail terminal and Cambridge worksites, with 2,000-3,000 daily boardings). Also oversee Emergency Ride Home program and commuter information programs. Manage NextBus AVL system for shuttles.","List(tma services, cambridge worksites, emergency ride, ride home, emergency ride home)","List(Map(score -> 0.5464460443720845, sentence -> 0), Map(score -> 0.5464460443720845, sentence -> 0), Map(score -> 0.43687228818560925, sentence -> 1), Map(score -> 0.43687228818560925, sentence -> 1), Map(score -> 0.39617831238884077, sentence -> 1))"
"Service coordinator for Waltham CitiBus network, as well as other TMA commuter shuttle routes. CitiBus provided fixed-route transit service to parts of Waltham not fully served by the MBTA. Also assisted with management of the Council's other shuttles (Alewife, Needham, and Bentley College).","List(waltham citibus, tma commuter, bentley college, waltham citibus network, tma commuter shuttle)","List(Map(score -> 0.6057436402052977, sentence -> 0), Map(score -> 0.5463402994396243, sentence -> 0), Map(score -> 0.5283089023826478, sentence -> 2), Map(score -> 1.0324447202802176, sentence -> 0), Map(score -> 0.882669009555137, sentence -> 0))"
,List(),List()
"The Star News covers Taylor County with publications such as: The Star News, The Star News Shopper. I create visual solutions for clients using a mix of creative skills and commercial awareness while having a good attitude and an open mind for constructive criticism.","List(star news, taylor county, star news, star news, news shopper, star news covers, star news shopper)","List(Map(score -> 0.08632855968821534, sentence -> 0), Map(score -> 0.3000139986750364, sentence -> 0), Map(score -> 0.08632855968821534, sentence -> 0), Map(score -> 0.08632855968821534, sentence -> 0), Map(score -> 0.30094279834533755, sentence -> 0), Map(score -> 0.31378701946054033, sentence -> 0), Map(score -> 0.17793336583739777, sentence -> 0))"
,List(),List()
Volunteer graphic designer on projects to help promote the non-profit solar energy industry.,"List(volunteer graphic, graphic designer, help promote, solar energy, energy industry)","List(Map(score -> 1.1369811712108198, sentence -> 0), Map(score -> 1.941529781644477, sentence -> 0), Map(score -> 1.2811431701438607, sentence -> 0), Map(score -> 1.2811431701438607, sentence -> 0), Map(score -> 1.2811431701438607, sentence -> 0))"


In [0]:
yake_score_df = profiles_with_keywords.select("metadata")
yake_score_df = yake_score_df.filter(size("metadata") > 0)
exploded_df_yake = yake_score_df.select(explode("metadata").alias("score_dict"))

# Select the "score" value from the exploded column
score_df_yake = exploded_df_yake.select("score_dict.score")

# Calculate the mean of the "score" values
mean_score_df_yake = score_df_yake.agg(expr("avg(score)"))

# Show the resulting mean score
mean_score_df_yake.show()


+------------------+
|        avg(score)|
+------------------+
|0.6076924729907589|
+------------------+



# 7. Gemeni

In [0]:
import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [0]:
import google.generativeai as genai
from pyspark.sql.functions import *

df_llm = df_llm.dropna()
grouped_df_llm = df_llm.groupBy('name','meta_industry','about').agg(collect_list('description').alias('descriptions'))
def get_generate_prompt(content, industry):
    #prompt = f"Extract key keywords or phrases from the following text: {content}"
    #prompt = prompt + """
    #1. Identify and list the most important keywords or key phrases in the text. These keywords should capture the main topics, concepts, or subjects of work #experience and job skills discussed in the text.
    #2. If there are subtopics or secondary themes mentioned in the text, list them as well.
    #3. Include the exact text span or sentence where each keyword or phrase is found in the original text.
    #4. Consider the context, relevance, and frequency of the keywords when determining their significance.
    #"""
    content_txt = " | ".join(content)
    #prompt = f"Briefly write the description that will maximize your chances of being hired based on the following about-me section: {content}. "
    
    desired_length = about_length_dict[industry]
    prompt = f"Your task is to generate a new 'about' paragraph of average length of {desired_length}, take into account that the job industry is {industry}, and build upon the provided list of past job experience: {content_txt}"
    prompt = prompt + """
    1.The paragraph should make a good impression that will maximize the chances of getting hired.
    2. Focus on closely aligning with the original paragraph's themes and data, highlighting your key strengths and experiences in a balanced and realistic #manner.
    3.Ensure that your new paragraph maintains coherence and relevance to the job search context, presenting yourself in the best possible light without #overstating your qualifications.

    """
    prompt = prompt + f"4. Try to incorporate key words from the given past job experience, that are related to the {industry} industry."
    
    return prompt
    
def text_generator(text, industry):
    # agent prompt
    prompt = get_generate_prompt(text, industry)

    # use the Gemini-Pro model generate content
    genetared_text = model.generate_content(prompt)
    
    return genetared_text

GEMINI_API_KEY = "AIzaSyAkMU-79EiBfUWbsjjai0CgoQC1bbCH8pM" # "input("INSERT YOUR GOOGLE API KEY PLEASE:")"
genai.configure(api_key=GEMINI_API_KEY)
model = genai.GenerativeModel('gemini-pro')

dataCollect=grouped_df_llm.limit(10).rdd.toLocalIterator()
# Convert the iterator to a list
dataCollect = list(dataCollect)

# Apply the generate_row function to the description column of each row
new_data = [Row(**{**row.asDict(), 'generated_description': text_generator(row['descriptions'], row['meta_industry']).text}) for row in dataCollect]

# Convert the list back to an RDD
new_rdd = spark.sparkContext.parallelize(new_data)

generated_desc_df = new_rdd.toDF()
generated_desc_df.display()

name,meta_industry,about,descriptions,generated_description
"""Doctor"" Phil Bernstein",Media and Entertainment,"WHAT I DO: I help small and medium-sized Pacific Northwest businesses connect with their target customers, deliver their message, and generate more sales. I do it with broadcast radio advertising and precisely-targeted digital campaigns. WHY IT WORKS: Because I’m able to offer a customized advertising plan (leveraging radio and the most effective digital platforms) I can tailor the best methods to reach YOUR specific audience where they are. Based on your target audience’s demographics, we'll craft a personalized, custom plan to maximize your reach and impact. We'll choose specific channels and outlets (broadcast or digital) where your potential customers are listening, reading and surfing. Then we'll use my copywriting skills to craft a persuasive message that compels your target to take action. WHO I HELP: ☛ Auto dealerships ☛ Financial services firms ☛ Funeral homes ☛ Furniture stores ☛ Home remodelers ☛ Insurance agencies and brokers ☛ Medical practices ☛ Mortgage companies ☛ Law firms ☛ Dental practices ☛ Retail stores ☛ Plumbers ☛ Senior living communities ☛ Any business that needs to find more customers and make more sales Ready to talk? Call me at 503-323-6611, or email me at philbernstein@iheartmedia.com.","List(As The Sales ""Doctor"" I write a column on sales and marketing issues for The Paint Contractor, a monthly magazine dedicated to the professional painter., WHAT I DO: I help television station sales departments increase their direct revenue, and teach television Account Executives a system for conducting a thorough and powerful needs analysis, revealing the client’s true advertising budget, building a proposal that earns a big piece of that budget, presenting the proposal, closing the businesses, and making it stick. COMPANIES I’VE WORKED WITH: ☆ Cordillera Communications ☆ Cox Media Group ☆ Gray Television ☆ Hearst Television ☆ Heartland Media ☆ Paxton Media Group ☆ Raycom Media ☆ Sarkes Tarzian, Inc ☆ Sinclair Broadcast Group ☆ TEGNA ☆ Tribune Media WHY IT WORKS: Each year, Jim Doyle & Associates consultants meet with hundreds of local advertisers in markets big and small, all over the United States. We accompany the Account Executives on sales call, teaching them by example how to build rapport with advertisers, gather the information they need, build and present a powerful television and digital marketing prop, Sold advertising and marketing solutions using the radio, online, and other digital tools of seven local radio stations. The process began with cold calling and proceeded through needs analysis, presentation, closing, and follow-up. As my career continued, I recognized a desperate need in the business community for good, professional , Sold program advertising, fence signs, radio time, season tickets, group sales, and promotional package for the the Beavers, a AAA club then in the Philadelphia Phillies organization. Managed souvenir program and novelty sales, hiring and supervising employees, ordering and tracking inventory, and collecting and depositing revenues. Spoke to community groups, service clubs, and schools to develop better public awareness of Portland Beaver baseball and to generate sales.)","As an accomplished sales strategist and marketing expert in the media and entertainment industry, I possess a proven track record of helping television stations maximize their direct revenue streams. Through my column in The Paint Contractor, I share my insights on sales and marketing best practices. Drawing upon my extensive experience with renowned organizations including Cordillera Communications and Hearst Television, I empower Account Executives with comprehensive sales techniques. My expertise encompasses in-depth needs analysis, compelling proposal development, persuasive presentations, and effective sales closing strategies. I have successfully implemented these techniques with hundreds of local advertisers across the nation, leveraging my deep understanding of the market and the ability to build strong relationships. My core competencies include: - Sales and Marketing Strategy Development - Television and Digital Advertising Solutions - NEEDS Analysis and Value Proposition Creation - Proposal Writing and Client Acquisition - Presentation and Negotiation Skills - Client Relationship Management and Retention"
A Mohamed Faizal Shariff,Services,"A Mohamed Faizal Shariff, an IT professional with 9+ years of work experience in the field of Software development using Salesforce platform, Siebel, Oracle CRM-OnDemand application. Expert Software Developer with over 7+ End to End Salesforce CRM Implementations (Cross product & Industry Verticals) for customers across diverse geographies in the areas of: • CRM Consulting, Innovation & Solution Design (Sales / Service & Support) • Integrating Salesforce with third party interfaces using REST API Expert Software Developer dedicated to constantly improving tools and infrastructure to maximize productivity, minimize system downtime and quickly respond to the changing needs of the business. Developed superior design and debugging capabilities, innovative problem-solving skills, and dedication to quality. Proficient in delivering value-based consulting services (Process, Strategy and Product), designing Innovative solution and new Product/App, establishing governance, architecture and delivery standards for customers. Expert in Business process Design & Re-engineering, Customer & User Experience Management, Solution Design & Architecture, Business analysis. Experienced in defining Solution, Application/System, Information, Security, Integration and Technical Architectures. Extensive hands-on experience in Apex, Visualforce, JavaScript, jQuery, Financial Force (PSA), Force.com migration tools (Force.Com IDE, Data loader and Change Sets). Possess hands-on experience in developing lightning applications using Lightning Web components.","List(● Worked alongside the executive team to define quarterly and yearly goals for my pod. Determined the timelines for upcoming projects based on the LOEs from Dev and QA ● Involved in application development lifecycle activities that include Analysis, Research, Design, Development and Unit Testing. ● Provided Data Migration Strategy document and POC of Shield (Encrypted Data Migration). ● Exclusively worked on Migrating 8 years of legacy data which includes attachments. ● Created various Reports (summary reports, matrix reports, dashboards, pie charts, and graphics) and folders to assist managers to properly utilize Salesforce as a sales too, A Mohamed Faizal Shariff, an IT professional with 7+ years of work experience in the field of Software development using Salesforce platform, Siebel, Oracle CRM-OnDemand application. Expert Software Developer withover 4+ End to End Salesforce CRM Implementations (Cross product & Industry Verticals) for customers across diverse geographies in the areas of: • CRM Consulting, Innovation & Solution Design (Sales / Service & Support) • Integrating Salesforcewith third party interfaces using REST API Expert Software Developer dedicated to constantly improving tools and infrastructure to maximize productivity, minimize system downtime and quickly respond to the changing needs of the business. Developed superior design and debugging capabilities, innovative problem solving skills and dedication to quality. Proficient in delivering value based consulting services (Process, Strategy and Product), designing Innovative solution and new Product/App, establishing governance, architec,  Worked under SAFe Agile methodology and Waterfall Methodology.  Having experience on Financial Force PSA application.  Experienced all phases of Salesforce Software Development Life Cycle (SDLC) and project life cycle processes from analysis, design, development, testing, imple,  Working as Senior Software Engineer in Infosys, Chennai.  Installation and configuration of Salesforce Environments.  Responsible for creating Fields, Objects, Tabs, Page Layouts, Field Level Security, Dependent Pick lists, Record Types, Relationships, Assignment Rules and Custom Setting)","Mohamed Faizal Shariff brings over 7 years of comprehensive experience in the Services industry, leveraging his expertise in Software Development and Salesforce platform. Throughout his career, he has consistently exceeded expectations in various roles, including providing valuable consulting services, designing innovative solutions, and optimizing operational processes. His proven ability to define strategic goals, map out project timelines, and deliver successful implementations make him a highly sought-after professional in the Services domain. Furthermore, his contributions to Data Migration, including the development of a Data Migration Strategy document and Proof of Concept of Shield, demonstrate his deep understanding of data management and security best practices. His proficiency in integrating Salesforce with third-party interfaces, adhering to Agile and Waterfall methodologies, and leveraging Financial Force PSA application further solidifies his versatility and expertise within the Services industry."
A-aron Peters,Technology,"""A-aron"" Peters (usually signed “All my best ~AMP”) Highly detailed, process driven, internally motivated extrovert. I meet strangers for about 3 seconds, but my closest friends (and wife) still consider me enigmatic, despite years of working to be transparent. Often referred to as “123”, the tiniest minutiae stands out, and there is an overwhelming drive to “make it fit, or correct”. People are my drive. Ever since I was small, I have wanted to make an impact on people’s lives. At one point I told my mother I wanted to be a bartender (THAT was not acceptable in our Baptist household!!). “Empathy” and “sympathy” are not words typically associated with my personality. There is a firm belief that “if you want something better, you (and only you) have the power to make it better, you just have to find the will.” “Thinkers” are my heroes. Rudyard Kipling wrote a poem “If” and outside of the Bible, that has GOT to be the best piece of prose ever put to paper ( See below) If you can keep your head when all about you Are losing theirs and blaming it on you; If you can trust yourself when all men doubt you, But make allowance for their doubting too: If you can wait and not be tired by waiting, Or, being lied about, don't deal in lies, Or being hated don't give way to hating, And yet don't look too good, nor talk too wise; If you can dream - and not make dreams your master; If you can think - and not make thoughts your aim, If you can meet with Triumph and Disaster And treat those two impostors just the same:. If you can bear to hear the truth you've spoken Twisted by knaves to make a trap for fools, Or watch the things you gave your life to, broken, And stoop and build'em up with worn-out tools; If you can make one heap of all your winnings And risk it on one turn of pitch-and-toss, And lose, and start again at your beginnings, And never breathe a word about your loss: If you can force your heart and nerve and sinew To serve your turn long after they are gone, And so hold on when there is nothing in you Except the Will which says to them: ""Hold on!"" If you can talk with crowds and keep your virtue, Or walk with Kings - nor lose the common touch, If neither foes nor loving friends can hurt you, If all men count with you, but none too much: If you can fill the unforgiving minute With sixty seconds' worth of distance run, Yours is the Earth and everything that's in it, And - which is more - you'll be a Man, my son! “Convenience is the author of complacency, but decision is the enabler of intention” All my best ~AMP",List(Providing solutions to reduce stress and add value for our business partners.),"With a background in technology, I excel at leveraging my expertise to provide innovative solutions that enhance efficiency and streamline processes. My past experience in developing and implementing technological frameworks, data analysis, and project management has honed my ability to identify areas of improvement and create tailored solutions. I am adept at understanding business needs and translating them into actionable technical plans. My drive to reduce stress and add value for stakeholders has consistently driven me to seek out opportunities to automate tasks, optimize workflows, and enhance collaboration. I am eager to contribute my skills and experience to a dynamic organization where I can make a meaningful impact through technology."
AJ (Aaron) Brody,Technology,"Started waiting on tables, progressed to AT&T long distance sales, into finessing print and internet ad sales, to lastly membership sales. I have ample experience in selling anything to everyone. I also had the opportunity to try my hand as a writer/columnist. I have been published in newspaper, magazine as well as on the internet.My experience in phone sales brought me to creating and running two Business Development Departments with in the LA Car Guy Auto Group. Successfully ran a team of up to six where we would use excellent customer service skills coupled with our product knowledge to pre-sell and schedule a sales appointment. I am currently Business Development Director at Toyota Santa Monica, where I revived the department to smash all expected goals and help the dealership out gross month to month. My creative edge mixed with my practical and empirical learning has crafted my ability to succeed in all that I do. He who loses his dreams, loses his soul along with it.","List(Director of business development and Auto Alert Team, Matchmaking for the discerning professional in Southern California. Providing discrete, and personalized matchmaking for the upscale SoCal community., As an account executive for Outlook, I introduce the niche gay and lesbian market to businesses looking to advertise.)","With a proven track record in business development and matchmaking, I am eager to contribute my expertise to the technology industry. As Director of Business Development for Auto Alert Team, I successfully expanded our reach, connecting discerning professionals with tailored solutions. Prior to that, as an Account Executive for Outlook, I effectively introduced the niche gay and lesbian market to businesses, leveraging my understanding of target demographics. My strong communication skills, relationship-building abilities, and passion for technology make me an ideal candidate to drive growth and innovation in your organization. I am confident that my strategic thinking, market analysis capabilities, and ability to forge lasting partnerships will enable me to excel in this role and contribute to your company's success."
AJ Ferrara,Government and Public Policy,"Experienced formal and informal educator with a demonstrated history of working in the civic & social organization industry. Skilled in Environmental Awareness, Environmental Education, Teamwork, and Data Analysis, with additional background in field geology and paleontology collections. Strong administrative professional with a Bachelor of Science (BS) focused in Geological and Environmental Sciences.","List(At one of the Park Service's busiest Visitor Education Centers I led tours, gave short and long form interpretive programs on topics ranging from ecology to history to geology, and roved geyser basins to make contact with Yellowstone visitors. In peak season the Old Faithful Visitor Education Center receives upwards of 15,000 visitors per day. As a result opportunities for informal interpretation abound, as do situations that, As a Program Assistant at the Los Angeles Zoo my job, first and foremost, is ensuring that formal interpretive programs run smoothly. To those ends I delivered and assisted in delivery of formal and informal educational events. I regularly prepare classrooms and accommodations for out guests, including sett, As Senior Curatorial Technician I worked as part of a team to ensure our paleontological collections were maintained for easy future access and study. Primarily we worked to catalogue, organize, and protect a warehouse of fossils, mostly found during excavation for construction projects around Orange County, dating back to the 1, During my time at the Palos Verdes Peninsula Land Conservancy I completed a range of assignments, from assisting in habitat restoration projects, to designing and updating exhibits, to leading tour groups. Indoors and out I acted as a point of contact for the public who used the spaces were preserved, discussing the work we were doing, the value of natural spaces, and the virtues of our local plants and animals., Taught two periods of a Cartooning class. Additionally acted as a camp counselor helping to organize events, working backstage at the camp's musical show, and putting on an art show to showcase the class's work. Worked with children age nine to fourteen.)","Through my diverse roles in public engagement, education, and conservation, I have fostered a deep understanding of environmental and historical interpretation, public policy, and stakeholder relations. In the dynamic environment of Yellowstone National Park, I excelled as a lead interpreter, crafting engaging programs on diverse topics for audiences of all ages and backgrounds. As a Program Assistant at the Los Angeles Zoo, I coordinated educational events, ensuring their smooth execution. My experience at the Palos Verdes Peninsula Land Conservancy equipped me to navigate the intersection of natural resource management and community outreach. I have also contributed to the field of paleontology as a Senior Curatorial Technician, maintaining invaluable fossil collections for future research and public access. These experiences have honed my skills in communication, collaboration, and environmental stewardship, enabling me to make a meaningful impact in the realm of Government and Public Policy."
ALEJANDRO MUÑOZ G. FundaciónInternacionalParaElReencuentro,Media and Entertainment,"The Colombian journalist ALEJANDRO MUÑOZ GARZON, creator of the FOUNDATION INTERNATIONAL FOR THE REUNION, www.funreencuentros.com/ pioneered national television to start with conducting historical research encounters, some of which were and are issued by national channels and Rafael Poveda foreign and TV, Caracol TV, Tevecine, JES Productions, Univision and Telemundo. Equally unique is the pioneer and promoter of desaprecidos search activities in media and packaging of consumer products, including milk cartons ""ALPINA"", campaign between 2001 and 2005 in Colombia; period during which about Colombian one hundred patients disappeared, they returned to their homes with the publication of photographs in dairy packaging. El periodista Colombiano ALEJANDRO MUÑOZ GARZON, gestor de la FUNDACION INTERNACIONAL PARA EL REENCUENTRO, www.funreencuentros.com/ es pionero en la televisión nacional al iniciar con sus investigaciones la realización de históricos reencuentros, algunos de los cuales fueron y son emitidos por canales nacionales y extranjeros como Rafael Poveda Televisión, Caracol TV, Tevecine, Producciones JES, Univisión y Telemundo. Igualmente es pionero y único impulsador de actividades de búsqueda de desaprecidos en medios de comunicación y empaques de productos de consumo masivo, entre ellos en las cajas de leche ALPINA, campaña realizada entre los años 2001 y 2005 en Colombia; periodo durante el cual, cerca de un centenar de Colombianos enfermos desaparecidos, regresaron a sus hogares gracias a la publicación de fotografias en los empaques lacteos.","List(Investigo el paradero de Padres, Madres, Hermanos e Hijos de origen latinoamericano que se han desaparecido por muchos años y por razones de violencia intrafamiliar o adopción y una vez ubicados y asistidos sicoafectiva y médicamente tras la aceptación y comprobacion científica (Test ADN sí es necesario) procedemos a ayudarlos en el logro del abrazo tan esperado, As benefactor and volunteer for the Foundation for the Reunion, www.funreencuentros.com/ I have witnessed for more than twelve years the outstanding work of this Foundation located in Bogota, Colombia. This institution was created to reacquaint healthy and successfully adopted Colombian individuals with their biological families, through a process of psycho-emotional adjustment accompanied by professionals in the field of documentary research, tracking, location and preparation for the reunion. During this time, while gaining experience on the issue of family separation and reunion, this institution that I highly recommend has performed an average of 10,669 reunions of Colombians lost from their families. Among them, 3,599 reunions have been made for Colombian adopted who sought, REPORTERO Y PRESENTADOR DE NOTAS PERIODISTICAS CON TINTE HUMORISTICO COMPROBANDO TEMAS Y REFRANES POPULARES COMO: ""DE TAL PALO TAL ASTILLA"" ""MATRIMONIO Y MORTAJA DEL CIELO BAJA"" ""LAS COSAS SE PARECEN A SUS DUENOS"" ETC. EN SABADOS FELICES SECCION LAS AVENTURAS DEL MACHORR)","For over a decade, my passion for bridging familial connections has driven me as a volunteer at the Foundation for the Reunion, where I've witnessed the transformative power of reuniting long-lost loved ones. My experience in uncovering their whereabouts and facilitating their psychoemotional recovery has honed my skills in documentary research, tracking, and location. Through the lens of popular sayings, I've explored the human condition as a reporter and presenter for Sábado Felices, delving into the themes of family bonds, destiny, and personal growth. These experiences have instilled in me a deep understanding of the emotional complexities and the profound impact of reconnecting families. I am eager to translate these skills into the dynamic realm of Media and Entertainment, where I can utilize my storytelling abilities and research acumen to engage and inspire audiences through compelling narratives that celebrate the resilience of the human spirit and the indomitable power of family connections."
ALLISON CHUNG,Education and Training,"Experienced Accounts Payable Specialist with a demonstrated history of working in the primary/secondary education industry. Skilled in Nonprofit Organizations, Microsoft Word, Team Building, Fundraising, and Leadership. Strong operations professional with a Master’s degree (M.S) focused in Human Service Counseling: Marriage and Family from Liberty University.","List(Performs difficult clerical tasks involving the application of bookkeeping principles and account keeping practices in order to maintain the financial accounts and records. Prepares and inputs accounting data into the Oracle financial system. Reviews invoice, Responsible for all aspects of the accounts payable process.Tracked the budget on a weekly basis during the accounts payable process. Responsible for reconciliation of all credit card payments. Reviewed and processed all expense reports for the company. Prepared billing invoices. Updated employee records. Responsible for new hire orientation. Responsible for all aspects of 1099 preparation. Assisted in payroll processes. Assisted in other fiscal and human resource duties as assigned., Prepared and entered all lockbox payments into the automated accounting system.Posted payment batches to the correct accounting period and prepared, filed and distributed final reports. Reviewed and processed expense reports and refund requests for appropriate authorizations and assigned the correct accounting entry.Prepared and entered journal entri)","I am a highly skilled and experienced professional with a proven track record in Education and Training. Throughout my career, I have consistently exceeded expectations in various roles within the industry, leveraging my expertise in financial management, accounting principles, and human resource administration. As a former accounts payable specialist responsible for all aspects of the process, I meticulously tracked budgets, reconciled credit card payments, and processed expense reports. My ability to maintain accurate financial records and ensure compliance with industry standards is a testament to my unwavering attention to detail. Furthermore, I have extensive experience in payroll processing, new hire orientation, and 1099 preparation, demonstrating my comprehensive understanding of human resource functions. My versatility and adaptability have enabled me to provide support in both fiscal and human resource capacities, showcasing my commitment to efficiency and teamwork. I am confident that I can bring my knowledge and skills to your Education and Training organization and make a significant contribution to its success."
ASHLEA HERRING,Retail and Consumer Goods,I am an outgoing fun people loving person.,"List(Calling the Client, picking up the client taking the client to their destination, cleaning the limo or party bus and pre trip and post trip)","Throughout my tenure in the retail and consumer goods industry, I have consistently exceeded expectations in providing exceptional customer service and maintaining the highest standards of excellence. As a Limo/Party Bus Driver, I excelled in ensuring the safety, comfort, and satisfaction of clients. I proactively called clients, punctually picked them up, and safely transported them to their destinations. Maintaining a pristine appearance of the vehicles was paramount, as I meticulously cleaned and inspected both limos and party buses before and after each trip. My keen attention to detail and commitment to providing a seamless experience have earned me a reputation for reliability and professionalism. I am confident that my skills and experience would make me a valuable asset to any organization seeking a dedicated and customer-focused individual in the retail and consumer goods sector."
Aaron Acuna,Government and Public Policy,"Have worked State/Government jobs since 2018. Good with data entry in Excel and copy writing/editing in Word. Have extensively used Teams and Outlook but am proficient in Google systems as well (Calendars, Sheets, etc.). From 2016-2018 I worked as a student employee for the University of Alaska, Anchorage (UAA) in various jobs, including as (1) a Library Intern (designing posters, working the front desk), (2) an English Department Intern (writing the website's scholarships page), (3) an Information Desk Clerk for the Learning Center, and (4) an English Tutor (proofreading/editing student papers, running the ESL Conversation Group). I was the unofficial records custodian for Alaska Occupational Safety & Health (AKOSH, or Alaska OSHA) from 2018-2020 in which I handled confidential records requests and requests for information under the Alaska Public Records Act (APRA). At that time, I took an Excel 101 course as voluntary job training. From 2020-2022 I spent most of my job at the State Ombudsman's Office copy-writing nuanced letters going out to various types of clients, including confidential responses to citizens and other state agencies. Since 2022 I've been working with the State of Alaska's FEMA Team in the Department of Health, in which I track and enter budget data in extensive Excel spreadsheets","List(FEMA Team data reviewer - Remote date entry, file organization, team meetings)","With a proven track record as a FEMA Team Data Reviewer, I bring a comprehensive skillset to government and public policy roles. My expertise in remote data entry, file organization, and team collaboration has honed my ability to handle sensitive information and contribute to high-priority initiatives. I am adept at extracting critical data, maintaining meticulous records, and ensuring efficient workflow. Moreover, my participation in team meetings has strengthened my communication and coordination abilities, enabling me to work effectively in a diverse and results-oriented environment. I am confident that my experience and commitment to accuracy, confidentiality, and public service will make me a valuable asset to any organization seeking to advance its mission in the government and public policy sector."
Aaron Baltz,Manufacturing,Pursuing a career in non-destructive testing and aerospace inspection.,"List(Electro-Mechanical Inspection, Tear down, evaluate and repair aircraft heavy engine nacelles and reversers.)","Seeking a Manufacturing role where I can leverage my expertise in Electro-Mechanical Inspection, gained through meticulously evaluating and repairing aircraft heavy engine nacelles and reversers. My proficiency in troubleshooting and repairing complex mechanical systems translates seamlessly to the manufacturing environment, enabling me to identify and resolve production-related issues promptly and effectively. With a keen eye for detail, I possess the ability to assess equipment functionality, diagnose faults, and implement corrective actions, ensuring optimal operational efficiency and minimizing downtime. Additionally, my familiarity with industry-specific tools and methodologies enables me to work collaboratively with production teams to optimize processes and enhance quality control measures, ultimately contributing to enhanced productivity and cost reduction within the manufacturing sector."
