<img src="banner-chatgpt.png" width="100%">
<br>

To establish your OpenAI connection, run the first two cells.
<br>
To export your chat:
* use File | Download As and
  * choose Jupyter Notebook format (.ipynb) or
  * choose Markdown format (.md) depending on your needs.

In [20]:
import openai

openai.api_base = 'http://aitools.cs.vt.edu:7860/openai/v1'
openai.api_key = 'aitools'

In [21]:
def get_completion(prompt, model="gpt-3.5-turbo"):
	messages = [{"role": "user", "content": prompt}]

	response = openai.ChatCompletion.create(
		model=model,
		messages=messages,
		temperature=0,  # this is the degree of randomness of the model's output
	)
	return response.choices[0].message["content"]


def converse(prompt, messages=None, model="gpt-3.5-turbo", max_tokens=3000, temperature=0, top_p=1, frequency_penalty=0,
			 presence_penalty=0):
	# Add the user's message to the list of messages
	if messages is None:
		messages = []

	messages.append({"role": "user", "content": prompt})

	response = openai.ChatCompletion.create(
		model=model,
		messages=messages,
		temperature=temperature,
		max_tokens=max_tokens,
		top_p=top_p,
		frequency_penalty=frequency_penalty,
		presence_penalty=presence_penalty,
	).choices[0].message["content"]

	# Add the assistant's message to the list of messages
	messages.append({"role": "assistant", "content": response})

	return response, messages


## Example 1: Simple chat

Feel free to remove these cells before you export your chat.

In [22]:
prompt = f"How are you feeling today?"


In [23]:
response = get_completion(prompt)
print(response)


As an AI, I don't have feelings, but I'm here to help you with any questions or tasks you have. How can I assist you today?


## Example 2: Conversation
Let's maintain a conversation history and see how the model responds to it.
Feel free to remove these cells before you export your chat.




In [24]:
messages = []
prompt = f"Hi my name is Jennifer."
response, messages = converse(prompt, messages)
print(f"You: {prompt}\nLLM: {response}")
prompt = f"What is my name?"
response, messages = converse(prompt, messages)
print(f"You: {prompt}\nLLM: {response}")

You: Hi my name is Jennifer.
LLM: Hello Jennifer! How can I assist you today?
You: What is my name?
LLM: Your name is Jennifer.


In [25]:
prompt, messages = converse("What is my name?", messages)

In [26]:
prompt = f"""
Assuming yourself as a python developer develop code for a unsupervised machine learning project.
follow the instructions below to generate the code of the project

```
# Load all data files matching 'data/datasets/train-00000-5k-2*.parquet'.
# Merge those dataframes into a single dataframe "df".
```

"""

(response,messages) = converse(prompt,messages)

In [27]:
print(f"{response}")

Certainly! Here's an example code snippet in Python that demonstrates how to load and merge multiple Parquet files into a single dataframe using the pandas library:

```python
import pandas as pd
import glob

# Define the path to the directory containing the Parquet files
file_path = 'data/datasets/train-00000-5k-2*.parquet'

# Get a list of all file paths matching the pattern
file_list = glob.glob(file_path)

# Initialize an empty list to store the dataframes
dfs = []

# Iterate over each file path and load the corresponding dataframe
for file in file_list:
    df = pd.read_parquet(file)
    dfs.append(df)

# Merge all dataframes into a single dataframe
df = pd.concat(dfs)

# Now you can perform further operations on the merged dataframe
# such as preprocessing, feature engineering, or unsupervised learning algorithms
```

Make sure to replace `'data/datasets/train-00000-5k-2*.parquet'` with the actual path to your Parquet files. This code will load all the Parquet files matching the 

In [28]:
prompt = f"""
Continue generating code for your project

```
# Define the code snippets (from data frames's "code" attribute)
# Define labels (from dataframe's "language" attribute)

# Convert code snippets to TF-IDF features
```
"""

response,messages = converse(prompt,messages)

In [29]:
print(f"{response}")

Certainly! Here's the continuation of the code snippet for your unsupervised machine learning project:

```python
from sklearn.feature_extraction.text import TfidfVectorizer

# Extract the code snippets from the "code" attribute of the dataframe
code_snippets = df['code'].values

# Extract the labels from the "language" attribute of the dataframe
labels = df['language'].values

# Convert code snippets to TF-IDF features
vectorizer = TfidfVectorizer()
features = vectorizer.fit_transform(code_snippets)
```

In this code snippet, we use the `TfidfVectorizer` from the `sklearn.feature_extraction.text` module to convert the code snippets into TF-IDF features. We extract the code snippets from the "code" attribute of the dataframe and store them in the `code_snippets` variable. Similarly, we extract the labels from the "language" attribute of the dataframe and store them in the `labels` variable.

Then, we initialize a `TfidfVectorizer` object and use its `fit_transform` method to convert th

In [30]:
prompt = f"""
Continue generating code for your project

```
# Perform K-Means clustering
```
"""

response,messages = converse(prompt,messages)

In [31]:
print(f"{response}")

Certainly! Here's the continuation of the code snippet for your unsupervised machine learning project:

```python
from sklearn.cluster import KMeans

# Perform K-Means clustering
num_clusters = 5  # Define the number of clusters you want to create
kmeans = KMeans(n_clusters=num_clusters)
kmeans.fit(features)

# Get the cluster labels for each data point
cluster_labels = kmeans.labels_
```

In this code snippet, we use the `KMeans` class from the `sklearn.cluster` module to perform K-Means clustering on the TF-IDF features. We define the number of clusters we want to create using the `num_clusters` variable.

Then, we initialize a `KMeans` object with the specified number of clusters and call its `fit` method to perform the clustering. The `fit` method takes the TF-IDF features as input.

After the clustering is performed, we can obtain the cluster labels for each data point using the `labels_` attribute of the `KMeans` object. These cluster labels are stored in the `cluster_labels` var

In [32]:
prompt = f"""
Continue generating code for your project

```
#for each cluster, output the cluster labels, along with 5 samples of code snippets from that cluster
# Print the cluster assignments with each cluster being truncated to 50 characters long
```
"""

response,messages = converse(prompt,messages)

In [33]:
print(f"{response}")

Certainly! Here's the continuation of the code snippet for your unsupervised machine learning project:

```python
# For each cluster, output the cluster labels, along with 5 samples of code snippets from that cluster
for cluster_id in range(num_clusters):
    cluster_samples = code_snippets[cluster_labels == cluster_id][:5]
    cluster_samples = [sample[:50] + '...' if len(sample) > 50 else sample for sample in cluster_samples]
    print(f"Cluster {cluster_id}:")
    print("Cluster Labels:", labels[cluster_labels == cluster_id])
    print("Code Snippets:")
    for sample in cluster_samples:
        print(sample)
    print()

# Print the cluster assignments with each cluster being truncated to 50 characters long
cluster_assignments = [str(label)[:50] + '...' if len(str(label)) > 50 else str(label) for label in cluster_labels]
print("Cluster Assignments:")
print(cluster_assignments)
```

In this code snippet, we iterate over each cluster and output the cluster labels along with 5 samples