<a href="https://colab.research.google.com/github/Narichie/Large-Language-Model/blob/main/Data_Science_Profiles_%26_Interview_Preparation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Data Science Profiles and Interview Preparation Aided by a Large Language Model**

#### **Problem Definition**

Designing Data Science Profiles and Interview Preparation Aided by an LLM

#### **BACKGROUND AND CONTEXT:**

Opportunities for data scientists and engineers are set to rise as companies leverage abundant data and technologies like generative AI to enhance their competitive edge. This trend is evidenced by the projected growth of data science jobs in the U.S., which is expected to outpace the national average. However, competition is increasing due to the growing number of graduates entering the field, with many universities launching programs in Data Analysis, Data Science, and related areas. If you're preparing for an interview, consider the necessary skills and typical profiles in this expanding market.

#### **Goal**

**To create a connection to an LLM** for knowledge extraction using a Jupiter notebook and HuggingFace.

**Prepare prompts to investigate:**
* Specific steps to prepare for a data science interview.
* The top five questions recruiters commonly ask entry-level data scientists.
* Descriptive profiles for a set of three data scientists.

### **1. Prompt Engineering advanced techniques using Llama2-13B.**

**Model Development and necessary installations and imports**

In [None]:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.28 --force-reinstall --upgrade --no-cache-dir --verbose -q 2>/dev/null

Collecting llama-cpp-python==0.2.28
  Downloading llama_cpp_python-0.2.28.tar.gz (9.4 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/9.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m5.0/9.4 MB[0m [31m151.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.4/9.4 MB[0m [31m154.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting typing-extensions>=4.5.0 (from llama-cpp-python==0.2.28)
  Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python==0.2.28)
  Downloading numpy-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
[2K  

In [None]:
!pip install huggingface_hub==0.23.2 -q 2>/dev/null

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/401.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m401.7/401.7 kB[0m [31m24.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

In [None]:
 # Model configuration
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGUF"
model_basename = "llama-2-13b-chat.Q5_K_M.gguf"
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
    )

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


llama-2-13b-chat.Q5_K_M.gguf:   0%|          | 0.00/9.23G [00:00<?, ?B/s]

In [None]:
    lcpp_llm = Llama(
        model_path=model_path,
        n_threads=2,
        n_batch=512,
        n_gpu_layers=43,
        n_ctx=4096,
    )

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [None]:
def generate_llama_response(user_prompt):

    # System message
    system_message = """
    [INST]<<SYS>> Respond to the user question based on the user prompt<</SYS>>[/INST]
    """

    # Combine user_prompt and system_message to create the prompt
    prompt = f"{user_prompt}\n{system_message}"


    # Generate a response from the LLaMA model
    response = lcpp_llm(
        prompt=prompt,
        max_tokens=600,
        temperature=0,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['INST'],
        echo=False
    )

    # Extract and return the response text
    response_text = response["choices"][0]["text"]
    return response_text


#### **Model Check**

In [None]:
# A. Data Science Interview Preparation
# 1. Top 10 steps to prepare for a data science interview

user_prompt = "Provide the top ten steps to prepare effectively for an entry-level data science interview"
response_interview_steps = generate_llama_response(user_prompt)
print(response_interview_steps)

 Sure, here are the top ten steps to prepare effectively for an entry-level data science interview:

Step 1: Review Fundamentals - Brush up on your statistics, mathematics, and computer programming fundamentals. These concepts form the foundation of data science, and being comfortable with them is essential for success in an entry-level data science role.

Step 2: Learn Programming Languages - Familiarize yourself with popular programming languages used in data science such as Python, R, SQL, and Julia. Practice writing code to solve problems related to data manipulation, analysis, and visualization.

Step 3: Study Data Structures and Algorithms - Understand the time and space complexity of various data structures and algorithms commonly used in data science. This will help you write efficient and scalable code.

Step 4: Learn Machine Learning Fundamentals - Familiarize yourself with supervised and unsupervised learning techniques, including linear regression, decision trees, clusterin

In [None]:
# 2. Top 5 questions recruiters commonly ask entry-level data scientists

user_prompt = "List the top five questions a recruiter might ask an entry-level data scientist during an interview"
response_recruiter_questions = generate_llama_response(user_prompt)
print(response_recruiter_questions)

Llama.generate: prefix-match hit


 Sure, here are the top five questions that a recruiter might ask an entry-level data scientist during an interview:

1. Can you tell us about your experience with statistical analysis and modeling? How have you applied these skills in real-world projects or applications?

This question is designed to assess the candidate's understanding of statistical concepts, their ability to apply them in practical situations, and their experience with data analysis software such as R, Python, or Excel. The recruiter wants to know if the candidate has hands-on experience working with datasets, performing statistical tests, and interpreting results.

2. How do you approach data visualization? Can you show us an example of a project where you created visualizations that effectively communicated insights to stakeholders?

This question is intended to evaluate the candidate's ability to communicate complex data insights to non-technical audiences through effective visualizations. The recruiter wants to

In [None]:
# Data Scientist Profiles
# JSON-based dataset describing senior data scientist profiles
user_prompt = '''
Create a JSON dataset for the profiles of five senior data scientists with the following details:
Name, Position, Major, Years of Experience, Technical Knowledge, Annual Salary, and Description of Current Activities.

Profiles:
- John Mitchell, Senior Data Scientist – Business Major. With 10 years of experience in business intelligence, I excel in machine learning, predictive modeling, and SQL. Proficient in Python, R, and Tableau, I deliver actionable business insights and drive change management initiatives. Annual Salary: $110,000.
- Emily Zhang, Senior Data Scientist – Industrial Engineering Major. With 8 years of experience in operations research and data engineering, I specialize in machine learning and predictive modeling to optimize industrial processes. Skilled in SQL, Python, R, Tableau, and Power BI. Annual Salary: $120,000.
- Robert Carter, Senior Data Scientist – Mechanics Major. With 7 years in mechanical engineering, I apply machine learning and deep learning to predict machinery failures. I use SQL, Python, R, and Tableau for system optimization and real-time monitoring. Annual Salary: $115,000.
- Sarah Thompson, Senior Data Scientist – Mathematics Major. A data scientist with 9 years in applied mathematics, I focus on complex data modeling and financial forecasting using machine learning, SQL, Python, and R. Proficient in Tableau and Power BI for visualization. Annual Salary: $130,000.
- David Sanchez, Senior Data Scientist – Economics and Finance Major. With 12 years of experience, I specialize in financial modeling and risk assessment using machine learning and SQL. Proficient in R, Python, Tableau, and Power BI, I lead data-driven strategies and change management. Annual Salary: $135,000.
'''
response_profiles = generate_llama_response(user_prompt)
print(response_profiles)

Llama.generate: prefix-match hit


 Sure! Here is a JSON dataset for five senior data scientists with their profiles, including name, position, major, years of experience, technical knowledge, annual salary, and description of current activities:

{
"Profiles": [
{
"Name": "John Mitchell",
"Position": "Senior Data Scientist",
"Major": "Business",
"Years of Experience": 10,
"Technical Knowledge": ["Machine Learning", "Predictive Modeling", "SQL"],
"Annual Salary": "$110,000",
"Description of Current Activities": "Deliver actionable business insights and drive change management initiatives using Python, R, and Tableau."
},
{
"Name": "Emily Zhang",
"Position": "Senior Data Scientist",
"Major": "Industrial Engineering",
"Years of Experience": 8,
"Technical Knowledge": ["Machine Learning", "Predictive Modeling", "SQL"],
"Annual Salary": "$120,000",
"Description of Current Activities": "Specialize in optimizing industrial processes using machine learning and predictive modeling skills in SQL, Python, R, Tableau, and Power BI.

#### **Conclusion**

 The goal is to connect with a large language model (LLM) to extract actionable insights on preparing for data science interviews and understanding relevant profiles.

To achieve this, I defined clear outcomes by setting specific output expectations. I crafted well-structured and concise prompts for effective LLM interaction. For example, I used prompts like, "List 10 steps to prepare for a data science interview."

In formulating these prompts, I incorporated terminology and frameworks pertinent to data science. I included phrases such as "specific tools and technologies" and "attributes of senior data scientists" to ensure alignment with industry standards.

I then compared the LLM's responses against my established objectives, verifying the completeness of the steps provided and assessing their coherence and suitability.




Thank you