# Sample CloudSql Connection

# Need to set up for each instance - Add your VM's IP to the Authorized Networks
**1. Find the external IP of your JupyterLab VM**
https://console.cloud.google.com/compute/instances?project=adsp-34002-on02-sopho-scribe&authuser=1
 * Go to VM Instances
 * Find your JupyterLab VM
 * Copy the External IP address (looks like 34.91.100.45).

**2. Add the VM's IP to your Cloud SQL authorized networks**
https://console.cloud.google.com/sql/instances/currensee-sql/connections/networking?authuser=1&project=adsp-34002-on02-sopho-scribe
* Go to Cloud SQL instances. 
* Click your instance.
* Click Connections in the left sidebar.
* Scroll to Authorized networks → Add network.
* Name: anything like jupyterlab-vm
* Network: paste the external IP you just copied (e.g., 34.91.100.45/32)
* IMPORTANT: Add /32 to allow only that single IP.
* Click Save.

It will take ~30 seconds to update.

In [None]:
from google.cloud import secretmanager
import pandas as pd
from currensee.utils.db_utils import create_pg_engine

## IMPORTANT: .env Configuration

In order for the DB credentials to be loaded properly, you will need to have a file located at 

`<fl>_currensee/currensee/.env`

with the following contents:

```bash
GOOGLE_API_KEY


```



In [None]:
from currensee.core.settings import Settings

settings = Settings()

#### Create SQLAlchemy engine

In [None]:
db_name = "outlook"
engine = create_pg_engine(db_name)

# Load Fake Data

The data was created using aistudio.google.com, because when trying to run the model locally, the context window ran out before all of the emails were generated.

The prompts used to produce the data are below:

email_prompt = """

**Objective:** Generate a large volume of synthetic email correspondence between a financial advisor at Bankwell Financial and representatives of her client companies (listed below), presented strictly in CSV format with specific columns.

**Characters & Companies:**

*   **Financial Advisor:** Jane Moneypenny (Financial Advisor, Bankwell Financial) - Email: `jane.moneypenny1@outlook.com`
*   **Client Representatives & Companies (Based on `janes_clients.csv`):**
    *   **AbbVie (Healthcare):** Cynthia Hobbs (Director) - `cynthia.hobbs@abbvie.com`, `(446)673-8121x90878`
    *   **AeroVironment (Manufacturing):** Jennifer Phelps (Senior Director) - `jennifer.phelps@aerovironment.com`, `797-584-0061x89137`
    *   **Amedisys (Healthcare):** Kyle Waters (Senior Director) - `kyle.waters@amedisys.com`, `590.239.3215x8014`
    *   **Celestica (Technology):** Denise Moore (Senior Director) - `denise.moore@celestica.com`, `754-579-1511x763`
    *   **Compass (RealEstate):** Adam Clay (VP) - `adam.clay@compass.com`, `-2931` *(Note: Treat the duplicate entry as one contact)*
    *   **GameStop Corp (Retail):** Amy Winters (Senior Director) - `amy.winters@gamestopcorp.com`, `381-842-2729x61450`
    *   **Guardant Health (Healthcare):** Roberto Martin (VP) - `roberto.martin@guardanthealth.com`, `420.200.4573x07741`
    *   **Hasbro (Retail):** Jessica Palmer (VP) - `jessica.palmer@hasbro.com`, `909.878.8329x31984`
    *   **Hyatt Hotels (Hospitality):** Timothy Ochoa (Manager) - `timothy.ochoa@hyatthotels.com`, `315-583-8080`
    *   **Intuitive Surgical (Healthcare):** Michelle Jenkins (Director) - `michelle.jenkins@intuitivesurgical.com`, `(612)342-3657x6255`
    *   **Ladder Capital Corp (RealEstate):** Ronnie Gray (Director) - `ronnie.gray@laddercapitalcorp.com`, `001-750-645-5770x695`
    *   **Lockheed Martin Corporation (Manufacturing):** Lisa Kennedy (Senior Director) - `lisa.kennedy@lockheedmartincorporation.com`, `+1-341-221-9798x577`
    *   **ManpowerGroup (Manufacturing):** David Moreno (Director) - `david.moreno@manpowergroup.com`, `001-945-497-7155x003` *(Note: Treat the duplicate entry as one contact)*
    *   **Mariott (Hospitality):** Mary Vasquez (Manager) - `mary.vasquez@mariott.com`, `520-618-5303x572` *(Note: Using CSV spelling 'Mariott')*
    *   **Matson (Manufacturing):** Anna Lawrence (Director) - `anna.lawrence@matson.com`, `+1-675-665-6673x605` *(Note: Treat the duplicate entry as one contact)*
    *   **Medtronic (Healthcare):** Tracey Smith (Director) - `tracey.smith@medtronic.com`, `+1-952-819-1211x857`
    *   **Presidio Property Trust (RealEstate):** Kelly Smith (Senior Director) - `kelly.smith@presidiopropertytrust.com`, `001-671-821-6029`

**Output Format:**

*   Strictly CSV (Comma Separated Values).
*   Use the following standard header row: `email_timestamp,to_names,to_emails,from_name,from_email,email_subject,email_body`

**CSV Column Definitions:**

1.  `email_timestamp`: The simulated date and time the email was sent.
    *   Format: Use a consistent format like `YYYY-MM-DD HH:MM:SS`.
    *   Chronology: Timestamps must be strictly chronological across all emails.
    *   Timespan: Cover a period from approximately mid-2018 to late-2023.
2.  `to_names`: The full name(s) of the email recipient(s). Use the full names as listed above (or Jane Moneypenny).
    *   If multiple recipients, names should be comma-separated (e.g., `"Cynthia Hobbs,Jane Moneypenny"`). Enclose in quotes if contains comma.
3.  `to_emails`: The email address(es) corresponding to the recipient(s) listed in `to_names`. Use Jane's specified email or the client emails from the list.
    *   If multiple recipients, email addresses should be comma-separated, maintaining the order from `to_names` (e.g., `"cynthia.hobbs@abbvie.com,jane.moneypenny1@outlook.com"`). Enclose in quotes if contains comma.
4.  `from_name`: The full name of the single email sender (e.g., "Jane Moneypenny", "Cynthia Hobbs"). Use the full names as listed above.
5.  `from_email`: The email address corresponding to the sender listed in `from_name`. This will be Jane's email (`jane.moneypenny1@outlook.com`) or the client contact's email from the list.
6.  `email_subject`: The subject line of the email (e.g., "Q3 Cash Management Strategy Review - AbbVie"). This column should contain *only* the subject text.
7.  `email_body`: The full text content of the email *excluding* the subject line.
    *   **Crucially:** Enclose the entire body text in double quotes (`"`) to properly handle commas, line breaks, and other special characters within the email body itself, ensuring valid CSV format.
    *   The body should start directly with the salutation (e.g., "Hi Cynthia,").
    *   Include realistic sign-offs (e.g., "Best regards,").
    *   **Mandatory:** Include realistic email signatures within the body for *both* the sender and recipient.
        *   Jane's signature must include her name, title, "Bankwell Financial", fictional phone number, and her specific email: `jane.moneypenny1@outlook.com`.
        *   Client representative signatures must include their full name, title, company name, and the email/phone details *provided in the list above*.

**Content Requirements for Emails:**

*   **Quantity:** Generate approximately **300-350** emails in total, ensuring multiple, extended interaction threads exist for *each client company* over the timespan.
*   **Realism & Detail:** Emails should reflect typical B2B financial advisory communication. They should be reasonably detailed, discussing specific (but fictional) corporate finance scenarios, market conditions relevant to businesses, Bankwell Financial services, proposed strategies, follow-ups, meeting logistics, etc. Avoid overly simplistic or generic messages.
*   **Conversation Flow:** Emails between Jane and representatives of a specific client company must form logical conversation threads. A reply should clearly relate to the preceding email's subject and body in that thread. The `to_names`, `to_emails`, `from_name`, `from_email` fields must accurately reflect the sender and recipient(s) for each specific email in the thread.
*   **Multiple Contacts:**
    *   Vary correspondence. While most interactions will likely be 1-on-1 between Jane and the listed contact for a company (due to the provided data), sometimes simulate scenarios involving multiple recipients *from the same company*.
    *   When an email involves multiple recipients (e.g., Jane emailing a client contact and hypothetically CC'ing a colleague, or a client contact replying to Jane and CC'ing a colleague):
        *   List *all* recipient names in the `to_names` column, comma-separated.
        *   List *all* corresponding recipient emails in the `to_emails` column, comma-separated, in the same order.
        *   *Constraint:* Only include Jane and contacts from the *same client company* in a single email's `to_names`/`to_emails`. Do not mix contacts from different *client* companies.
        *   *(Optional realism)*: The email body's salutation might address the primary recipient, or mention the CC'd individual (e.g., "Hi Cynthia (and team),").
*   **Topics (B2B Focus):** Cover a variety of relevant corporate finance topics, tailoring discussion topics somewhat to the client company's *industry* (e.g., Healthcare, Manufacturing, Real Estate, Technology, Retail, Hospitality) as listed in the provided client data, while still covering a broad range of corporate finance issues:
    *   Initial introductions and discussions about Bankwell Financial's services.
    *   Detailed corporate cash management strategies.
    *   Analysis of short-term investment options for corporate liquidity.
    *   Business loans, lines of credit, venture debt considerations.
    *   Currency exchange services, hedging strategies (especially relevant for manufacturing/global companies).
    *   Investment management proposals for corporate reserves.
    *   Guidance on employee benefit plans (e.g., 401k).
    *   Discussions on interest rate risk, market risk, economic updates.
    *   Fee structures, service agreements, onboarding, compliance.
    *   Regular performance reviews, quarterly updates, annual planning.
    *   Handling operational issues or client service requests.
*   **Relationship Evolution:** Show professional relationships developing over the 5+ year span, building trust, moving from initial setup to ongoing management and tackling more complex strategic financial issues.

**Example Row Structure (Using new columns):**

*Example 1: Jane emails Cynthia*
`2021-05-10 14:30:00,"Cynthia Hobbs","cynthia.hobbs@abbvie.com","Jane Moneypenny","jane.moneypenny1@outlook.com","Re: Q2 Cash Flow Projections - AbbVie","Hi Cynthia,\n\nThanks for sending over the updated Q2 cash flow projections for AbbVie. I've reviewed them against the investment strategy we discussed last month.\n\nThe short-term liquidity seems well-covered by the current money market allocation. Regarding the anticipated surplus in late June, we could consider deploying that into the slightly higher-yield commercial paper option we modelled. The current market rates are favourable.\n\nCould we schedule a brief call early next week to finalize this?\n\nBest regards,\n\n--\nJane Moneypenny\nFinancial Advisor\nBankwell Financial\nPhone: (555) 123-4567\nEmail: jane.moneypenny1@outlook.com\n"`

*Example 2: Cynthia replies to Jane*
`2021-05-11 09:00:00,"Jane Moneypenny","jane.moneypenny1@outlook.com","Cynthia Hobbs","cynthia.hobbs@abbvie.com","Re: Q2 Cash Flow Projections - AbbVie","Hi Jane,\n\nThanks for the quick review. Yes, let's discuss the commercial paper option. Does Tuesday at 10 AM work for you?\n\nBest,\n\n--\nCynthia Hobbs\nDirector\nAbbVie\nEmail: cynthia.hobbs@abbvie.com\nPhone: (446)673-8121x90878\n"`

*Example 3: Cynthia replies to Jane and hypothetically CCs a colleague 'Bob Finance <bob.finance@abbvie.com>'*
`2021-05-11 09:05:00,"Jane Moneypenny,Bob Finance","jane.moneypenny1@outlook.com,bob.finance@abbvie.com","Cynthia Hobbs","cynthia.hobbs@abbvie.com","Re: Q2 Cash Flow Projections - AbbVie","Hi Jane,\n\n(CC'ing Bob from our finance team)\n\nThanks again. Tuesday at 10 AM is confirmed from our side as well.\n\nBest,\n\n--\nCynthia Hobbs\nDirector\nAbbVie\nEmail: cynthia.hobbs@abbvie.com\nPhone: (446)673-8121x90878\n"`
*(Note: The example correctly shows comma-separated values in `to_names` and `to_emails` when multiple recipients are involved. Ensure client signatures in the body use the specific details from the list provided.)*

**Final Instruction:** Please output *only* the raw CSV data, starting with the header row (`email_timestamp,to_names,to_emails,from_name,from_email,email_subject,email_body`), adhering strictly to the format and content requirements described above. Do not include any introductory text, explanations, or summaries before or after the CSV data. Ensure all `email_body` content is properly enclosed in double quotes, and fields containing commas (like multi-recipient `to_names` or `to_emails`) are also enclosed in double quotes as needed for valid CSV.





"""

In [None]:
emails_df = pd.read_csv("fake_emails.csv")

In [None]:
emails_df.head()

meeting_prompt = """

**Objective:** Generate a synthetic dataset of meetings between financial advisor Jane Moneypenny and her clients, logically derived from the topics and scheduling interactions suggested in the previously generated email correspondence dataset (covering mid-2018 to late-2023). The output must be strictly in CSV format.

**Context:**
*   Base the meetings on the interactions, confirmed meeting times, and discussion topics evident in the prior email dataset involving Jane Moneypenny (`jane.moneypenny1@outlook.com`) and her clients (AbbVie, AeroVironment, Amedisys, Celestica, Compass, GameStop Corp, Guardant Health, Hasbro, Hyatt Hotels, Intuitive Surgical, Ladder Capital Corp, Lockheed Martin Corporation, ManpowerGroup, Mariott, Matson, Medtronic, Presidio Property Trust, using the specific contact names and emails provided previously).
*   Meetings should only be generated where the email history suggests a meeting was scheduled or would logically occur (e.g., following a proposal, quarterly review scheduling, specific discussion requests).

**Output Format:**

*   Strictly CSV (Comma Separated Values).
*   Use the following standard header row: `meeting_timestamp,host,host_email,invitees,invitee_emails,meeting_subject`

**CSV Column Definitions:**

1.  `meeting_timestamp`: The simulated date and time the meeting occurred.
    *   Format: Use a consistent format like `YYYY-MM-DD HH:MM:SS`.
    *   Chronology: Timestamps must be strictly chronological across all meetings.
    *   Timespan: Cover the period from mid-2018 to late-2023 for historical meetings, reflecting the email data.
    *   Time Constraints: Meetings must occur **Monday to Friday, between 9:00 AM and 5:00 PM**. Assume Eastern Time (ET) for scheduling unless client context strongly implies otherwise.
2.  `host`: The full name of the meeting host. This will **always** be "Jane Moneypenny".
3.  `host_email`: The email address of the host. This will **always** be `jane.moneypenny1@outlook.com`.
4.  `invitees`: The full name(s) of the client representative(s) attending the meeting. Use the client names from the provided list.
    *   Only include contacts from a **single client company** per meeting.
    *   If hypothetical emails involved multiple contacts from the same company being CC'd *and* the meeting was implied for both, list names comma-separated (e.g., `"Cynthia Hobbs,Bob Finance"`). Otherwise, list the primary contact involved (e.g., "Cynthia Hobbs"). Enclose in quotes if contains comma. *Based on the previous email generation focusing mostly on 1:1, expect single names predominantly.*
5.  `invitee_emails`: The email address(es) corresponding to the invitee(s) listed in `invitees`. Use the client emails from the provided list.
    *   If multiple invitees, email addresses should be comma-separated, maintaining the order from `invitees` (e.g., `"cynthia.hobbs@abbvie.com,bob.finance@abbvie.com"`). Enclose in quotes if contains comma.
6.  `meeting_subject`: The subject line or primary topic of the meeting.
    *   This subject **must logically derive** from the email conversations occurring around the `meeting_timestamp`. Use subjects suggested or confirmed in the emails (e.g., "AbbVie - Introductory Call", "AeroVironment - Discuss FX Forward Pricing", "Compass - Credit Facility Term Sheet Review", "Quarterly Portfolio Review - Medtronic", "Lockheed Martin - LDI Strategy Presentation").

**Content Requirements for Meetings:**

*   **Derivation from Emails:** Generate meetings primarily based on explicit scheduling found in the previous email dataset (e.g., "Tuesday at 2 PM CT works perfectly"). Place the `meeting_timestamp` accurately based on such confirmations.
*   **Logical Cadence:** Infer meetings where highly probable even if not explicitly confirmed minute-by-minute (e.g., quarterly reviews for active investment clients, kick-off meetings after a proposal acceptance, follow-up discussions after complex information sharing). The cadence should be realistic – not every email leads to a meeting. Expect recurring meetings (like quarterly reviews) for some clients, and ad-hoc meetings for others based on specific needs (financing, hedging, implementation).
*   **Quantity (Historical):** Generate a realistic number of meetings across the mid-2018 to late-2023 timeframe, reflecting the ~300-350 emails previously generated. This might be in the range of 50-100 meetings, depending on the nature of interactions.
*   **Future Meetings:**
    *   Generate **10-20 additional meeting rows** with `meeting_timestamp` values falling in **early 2024 (e.g., January to March 2024)**.
    *   These future meetings must represent **plausible next steps or continuations** of the relationships and topics observed towards the end of the 2023 email/meeting data.
    *   `meeting_subject` for future meetings should reflect logical follow-ups (e.g., "Q4 2023 Portfolio Review - Guardant Health", "Discuss H1 2024 FX Hedging - Hasbro", "Hyatt Hotels - 401k RFI Discussion", "AeroVironment - SCF Pilot Kick-off", "Intuitive Surgical - FX Services Deep Dive").
    *   Invitees/Emails for future meetings should be consistent with the client contacts established.

**Example Row Structure:**

`2018-07-17 15:00:00,"Jane Moneypenny","jane.moneypenny1@outlook.com","Cynthia Hobbs","cynthia.hobbs@abbvie.com","AbbVie - Introductory Call & Bankwell Services Overview"`
`2020-01-30 14:00:00,"Jane Moneypenny","jane.moneypenny1@outlook.com","Ronnie Gray","ronnie.gray@laddercapitalcorp.com","Ladder Capital Corp - Q4 2019 Portfolio Review"`
`2023-09-19 14:00:00,"Jane Moneypenny","jane.moneypenny1@outlook.com","Denise Moore","denise.moore@celestica.com","Celestica - Discuss Short-Term Investment Strategy (CDs/CP)"`
`2024-01-15 10:00:00,"Jane Moneypenny","jane.moneypenny1@outlook.com","Jessica Palmer","jessica.palmer@hasbro.com","Hasbro - Finalize H1 2024 FX Hedging Plan"`

**Final Instruction:** Please output *only* the raw CSV data, starting with the header row (`meeting_timestamp,host,host_email,invitees,invitee_emails,meeting_subject`), adhering strictly to the format and content requirements described above. Do not include any introductory text, explanations, or summaries before or after the CSV data. Ensure fields containing commas (like potential multi-invitee names/emails) are properly enclosed in double quotes.

---


"""

In [None]:
meetings_df = pd.read_csv("fake_meetings.csv")

In [None]:
meetings_df.head()

## Load to database

In [None]:
emails_df.to_sql("email_data", engine, if_exists="replace", index=False)

In [None]:
meetings_df.to_sql("meeting_data", engine, if_exists="replace", index=False)

## Inspect the Data

In [None]:
df = pd.read_sql("SELECT * from email_data", con=engine)
df.head()

In [None]:
df = pd.read_sql("SELECT * from meeting_data", con=engine)
df.tail()

In [None]:
import datetime
import pytz

print(
    f"Notebook last execution time: {datetime.datetime.now(pytz.timezone('US/Central')).strftime('%a, %d %B %Y %H:%M:%S')}"
)