Synthetic-Experts

Approximate Generative AI with fine-tuned LLM for complex Classification Tasks

Download the working paper from SSRN

Creating Synthetic Experts with Generative Artificial Intelligence

Project Website

www.synthetic-experts.ai

Notebooks in this Repo

Using the MMX Synthetic Expert

Quickly identify which marketing mix (MMX) variable a text pertains to, if any: SyntheticExperts_Quickstart_MMXClassifier.ipynb
Batch processing for rapidly identifying MMX variables at scale: SyntheticExperts_Predict_Texts_with_MMXClassifier.ipynb
Reveal differences in consumer sentiment across brands' MMX from texts and discover MMX-specific topics: SyntheticExperts_MMX_Sentiment_x_Topics.ipynb

Creating a Synthetic Expert

Label texts using generative AI by example of GPT4 via OpenAI's API: SyntheticExperts-Labels-from-Generative-AI.ipynb
Approximate a powerful generative AI for a specific task by fine-tuning a foundational large language model: SyntheticExperts_Train_MMX_Classifier.ipynb

Anonymizing texts with Synthetic Twins

Create replicas of texts that capture their idea and meaning but obfuscate identifying information with generative AI: SyntheticExperts_Create_Synthetic_Twins_of_Texts.ipynb

GPU setup on Apple M1 and M2 systems

Get your Apple notebook ready to create Synthetic Experts and rapidly identify constructs of interest in vast amounts of text: Setup-MacBook-M2-Pytorch-TensorFlow-Apr2023.ipynb

Application: Identifying Marketing Mix Variabels (4P of Marketing) in Tweets

MMX Classifier

You can use this classifier to determine which of the 4P's of marketing, also known as marketing mix variables, a microblog post (e.g., Tweet) pertains to:

Product
Place
Price
Promotion

This classifier is a fine-tuned checkpoint of [cardiffnlp/twitter-roberta-large-2022-154m] (https://huggingface.co/cardiffnlp/twitter-roberta-large-2022-154m). It was trained on 15K Tweets that mentioned at least one of 699 brands. The Tweets were frist cleaned and then labeled using OpenAI's GPT4.

Because this is a multi-label classification problem, we use binary cross-entropy (BCE) with logits loss for the fine-tuning. We basically combine a sigmoid layer with BCELoss in a single class. To obtain the probabilities for each label (i.e., marketing mix variable), you need to "push" the predictions through a sigmoid function. This is already done in the accompanying python notebook.

IMPORTANT: At the time of writing this description, Huggingface's pipeline did not support multi-label classifiers.

Quickstart

Check out this Python Notebook to try out a MMX Synthetic Expert

Demo Datasets

Demo datasets are available in the Data folder. The texts in these datasets are based on real Tweets that were rewritten by an AI. I call these data Synthetic Twins.

Synthetic Twins correspond semantically in idea and meaning to original texts. However, wording, people, places, firms, brands, and products were changed by an AI. As such, Synthetic Twins mitigate, to some extent, possible privacy, and copyright concerns. If you'd like to learn more about Synthetic Twins, another generative AI project by Daniel Ringel, then please get in touch! dmr@unc.edu

You can create your own Synthetic Twins of texts with this Python notebook: SyntheticExperts_Create_Synthetic_Twins_of_Texts.ipynb, available as BETA version (still being tested) in this repo.

Citation

Please cite the following reference if you use Synthetic Experts and/or Synthetic Twins in your own research or projects:

Ringel, Daniel, Creating Synthetic Experts with Generative Artificial Intelligence (July 15, 2023). Available at SSRN: https://ssrn.com/abstract=4542949

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Data		Data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Ringel_Creating_Synthetic_Experts_with_Generative_AI_SanFrancisco_Sep2023.pdf		Ringel_Creating_Synthetic_Experts_with_Generative_AI_SanFrancisco_Sep2023.pdf
Setup-MacBook-M2-Pytorch-TensorFlow-Apr2023.ipynb		Setup-MacBook-M2-Pytorch-TensorFlow-Apr2023.ipynb
SyntheticExperts-Labels-from-Generative-AI.ipynb		SyntheticExperts-Labels-from-Generative-AI.ipynb
SyntheticExperts_Create_Synthetic_Twins_of_Texts.ipynb		SyntheticExperts_Create_Synthetic_Twins_of_Texts.ipynb
SyntheticExperts_MMX_Sentiment_x_Topics.ipynb		SyntheticExperts_MMX_Sentiment_x_Topics.ipynb
SyntheticExperts_Predict_Texts_with_MMXClassifier.ipynb		SyntheticExperts_Predict_Texts_with_MMXClassifier.ipynb
SyntheticExperts_Quickstart_MMXClassifier.ipynb		SyntheticExperts_Quickstart_MMXClassifier.ipynb
SyntheticExperts_Train_MMX_Classifier.ipynb		SyntheticExperts_Train_MMX_Classifier.ipynb
UseSynExp.py		UseSynExp.py

License

dringel/Synthetic-Experts

Folders and files

Latest commit

History

Repository files navigation

Synthetic-Experts

Download the working paper from SSRN

Project Website

Notebooks in this Repo

Application: Identifying Marketing Mix Variabels (4P of Marketing) in Tweets

Quickstart

Demo Datasets

Citation

About

Resources

License

Stars

Watchers

Forks

Languages