
### Step 1: Create a Table with Complex Data Types

Here is an example of a Hive `CREATE TABLE` statement with complex data types:

```sql
CREATE TABLE employee_details (
    id INT,
    name STRING,
    skills ARRAY<STRING>,
    addresses MAP<STRING, STRING>,
    job_info STRUCT<
        job_title:STRING, 
        start_date:STRING, 
        salary:DOUBLE
    >
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':'
STORED AS TEXTFILE;
```

- **`skills`**: An array of strings representing the skills of an employee.
- **`addresses`**: A map with key-value pairs where both keys and values are strings, representing different types of addresses (e.g., home, office).
- **`job_info`**: A struct to store nested fields like job title, start date, and salary.

### Step 2: Prepare a Sample Data File

Create a sample data file named `employee_details.txt` with the following content:

```
1,John Doe,Java|Python|SQL,home:New York|office:San Francisco,{job_title:'Engineer',start_date:'2022-01-10',salary:85000.0}
2,Jane Smith,Python|R|Hadoop,home:Chicago|office:Seattle,{job_title:'Data Scientist',start_date:'2021-05-15',salary:95000.0}
3,Mark Johnson,Scala|Spark|Kafka,home:Boston|office:Denver,{job_title:'Data Engineer',start_date:'2020-03-22',salary:105000.0}
```

### Step 3: Load Data into the Hive Table

Use the `LOAD DATA` statement to load data from the text file into the Hive table:

```sql
LOAD DATA LOCAL INPATH '/path/to/employee_details.txt' INTO TABLE employee_details;
```

Replace `/path/to/employee_details.txt` with the actual path where your data file is stored.

### Step 4: Query Inner Values from Complex Structures

Now, let's run some select queries to extract inner values from these complex structures:

1. **Retrieve all employee names and their first skill:**

   ```sql
   SELECT 
       name, 
       skills[0] AS first_skill 
   FROM employee_details;
   ```

2. **Get all employees' office addresses:**

   ```sql
   SELECT 
       name, 
       addresses['office'] AS office_address 
   FROM employee_details;
   ```

3. **Fetch job titles and salaries from the nested `job_info` struct:**

   ```sql
   SELECT 
       name, 
       job_info.job_title AS job_title,
       job_info.salary AS salary
   FROM employee_details;
   ```

4. **Retrieve all data for employees with "Python" as one of their skills:**

   ```sql
   SELECT * 
   FROM employee_details 
   WHERE array_contains(skills, 'Python');
   ```

These queries demonstrate how to interact with and extract data from complex structures in Hive. Make sure your Hive environment is properly configured to execute these queries successfully.



1. **Find employees who have more than two skills:**

   ```sql
   SELECT 
       name, 
       size(skills) AS skill_count
   FROM employee_details
   WHERE size(skills) > 2;
   ```

2. **Get employees with a specific job title and their respective office addresses:**

   ```sql
   SELECT 
       name, 
       addresses['office'] AS office_address 
   FROM employee_details 
   WHERE job_info.job_title = 'Data Scientist';
   ```

3. **Retrieve employees whose salary is greater than the average salary in the table:**

   ```sql
   SELECT 
       name, 
       job_info.salary 
   FROM employee_details 
   WHERE job_info.salary > (SELECT AVG(job_info.salary) FROM employee_details);
   ```

4. **List all employees who have both 'Java' and 'Python' as skills:**

   ```sql
   SELECT 
       name 
   FROM employee_details 
   WHERE array_contains(skills, 'Java') AND array_contains(skills, 'Python');
   ```

5. **Fetch employee names along with all skills sorted alphabetically:**

   ```sql
   SELECT 
       name, 
       sort_array(skills) AS sorted_skills 
   FROM employee_details;
   ```

6. **Get the total number of employees in each job title:**

   ```sql
   SELECT 
       job_info.job_title, 
       COUNT(*) AS total_employees 
   FROM employee_details 
   GROUP BY job_info.job_title;
   ```

7. **List employees whose home address is in a particular city (e.g., 'New York'):**

   ```sql
   SELECT 
       name 
   FROM employee_details 
   WHERE addresses['home'] = 'New York';
   ```

8. **Retrieve employees whose salary is within the top 20% of all salaries:**

   ```sql
   SELECT 
       name, 
       job_info.salary 
   FROM employee_details 
   ORDER BY job_info.salary DESC 
   LIMIT (SELECT COUNT(*) / 5 FROM employee_details);
   ```

9. **Find the highest salary for each job title:**

   ```sql
   SELECT 
       job_info.job_title, 
       MAX(job_info.salary) AS max_salary 
   FROM employee_details 
   GROUP BY job_info.job_title;
   ```

10. **List the number of unique skills across all employees:**

   ```sql
   SELECT 
       COUNT(DISTINCT skill) AS unique_skill_count 
   FROM employee_details 
   LATERAL VIEW explode(skills) skillTable AS skill;
   ```

11. **Find employees with the longest tenure (based on the `start_date` in `job_info`):**

   ```sql
   SELECT 
       name, 
       job_info.start_date 
   FROM employee_details 
   ORDER BY from_unixtime(unix_timestamp(job_info.start_date, 'yyyy-MM-dd')) ASC 
   LIMIT 1;
   ```

12. **Calculate the average salary by the number of skills an employee possesses:**

   ```sql
   SELECT 
       size(skills) AS num_skills, 
       AVG(job_info.salary) AS avg_salary 
   FROM employee_details 
   GROUP BY size(skills);
   ```

13. **Get the names and job titles of employees who have more than one address type (e.g., both home and office):**

   ```sql
   SELECT 
       name, 
       job_info.job_title 
   FROM employee_details 
   WHERE size(addresses) > 1;
   ```

14. **Retrieve employees and their skills that include 'Python' but not 'Java':**

   ```sql
   SELECT 
       name, 
       skills 
   FROM employee_details 
   WHERE array_contains(skills, 'Python') AND NOT array_contains(skills, 'Java');
   ```



In [None]:
INSERT INTO employee_details VALUES 
(1, 'John Doe', 
 array('Java', 'Python', 'SQL'), 
 map('home', 'New York', 'office', 'San Francisco'), 
 named_struct('job_title', 'Engineer', 'start_date', '2022-01-10', 'salary', 85000));

### 1. **View for Basic Employee Information**

Create a view to show basic employee information such as `id`, `name`, and `job_title`:

```sql
CREATE VIEW employee_basic_info AS
SELECT 
    id, 
    name, 
    job_info.job_title AS job_title 
FROM employee_details;
```

**Query the View:**

```sql
SELECT * FROM employee_basic_info;
```

This will return a result set with columns: `id`, `name`, and `job_title`.

### 2. **View for Employees with High Salaries**

Create a view to show employees whose salary is above a certain threshold (e.g., 90,000):

```sql
CREATE VIEW high_salary_employees AS
SELECT 
    id, 
    name, 
    job_info.salary 
FROM employee_details 
WHERE job_info.salary > 90000;
```

**Query the View:**

```sql
SELECT * FROM high_salary_employees;
```

This will return a list of employees with salaries greater than 90,000.

### 3. **View for Employee Skills**

Create a view to show each employee's name along with their skills exploded into separate rows:

```sql
CREATE VIEW employee_skills_exploded AS
SELECT 
    name, 
    skill 
FROM employee_details 
LATERAL VIEW explode(skills) AS skill;
```

**Query the View:**

```sql
SELECT * FROM employee_skills_exploded;
```

This will return a result set where each employeeâ€™s skill is listed in a separate row.

### 4. **View for Employee Address Information**

Create a view to display employees' names and their home addresses:

```sql
CREATE VIEW employee_home_addresses AS
SELECT 
    name, 
    addresses['home'] AS home_address 
FROM employee_details;
```

**Query the View:**

```sql
SELECT * FROM employee_home_addresses;
```

This will return a list of employee names and their corresponding home addresses.

### 5. **View for Employees with Specific Skills**

Create a view to list employees who have specific skills (e.g., `Python`):

```sql
CREATE VIEW employees_with_python_skill AS
SELECT 
    id, 
    name 
FROM employee_details 
WHERE array_contains(skills, 'Python');
```

**Query the View:**

```sql
SELECT * FROM employees_with_python_skill;
```

This will return all employees who have `Python` as one of their skills.

### 6. **View for Job Titles with Employee Count**

Create a view to show the count of employees in each job title:

```sql
CREATE VIEW job_title_employee_count AS
SELECT 
    job_info.job_title, 
    COUNT(*) AS total_employees 
FROM employee_details 
GROUP BY job_info.job_title;
```

**Query the View:**

```sql
SELECT * FROM job_title_employee_count;
```

This will provide a count of how many employees hold each job title.

### 7. **View for Average Salary by Number of Skills**

Create a view to calculate the average salary grouped by the number of skills an employee possesses:

```sql
CREATE VIEW avg_salary_by_skill_count AS
SELECT 
    size(skills) AS num_skills, 
    AVG(job_info.salary) AS avg_salary 
FROM employee_details 
GROUP BY size(skills);
```

**Query the View:**

```sql
SELECT * FROM avg_salary_by_skill_count;
```

This will show the average salary for employees based on the number of skills they have.

### 8. **View for Recent Hires**

Create a view to list employees who were hired after a specific date:

```sql
CREATE VIEW recent_hires AS
SELECT 
    name, 
    job_info.start_date 
FROM employee_details 
WHERE from_unixtime(unix_timestamp(job_info.start_date, 'yyyy-MM-dd')) > '2022-01-01';
```

**Query the View:**

```sql
SELECT * FROM recent_hires;
```

This will return employees who started after January 1, 2022.

### 9. **View for Employees with Multiple Addresses**

Create a view to show employees who have more than one type of address (e.g., both home and office):

```sql
CREATE VIEW employees_with_multiple_addresses AS
SELECT 
    name, 
    size(addresses) AS num_addresses 
FROM employee_details 
WHERE size(addresses) > 1;
```

**Query the View:**

```sql
SELECT * FROM employees_with_multiple_addresses;
```

This will return the employees who have more than one address type stored.

### 10. **View for Employees in Specific Cities**

Create a view to list employees whose home address is in a specific city (e.g., 'New York'):

```sql
CREATE VIEW employees_in_new_york AS
SELECT 
    name 
FROM employee_details 
WHERE addresses['home'] = 'New York';
```

**Query the View:**

```sql
SELECT * FROM employees_in_new_york;
```

This will return employees who have 'New York' as their home address.


In [None]:
from pyhive import hive

# Define connection parameters
host = 'localhost'  # Since Hive is running locally
port = 10000        # Default port for HiveServer2
username = 'ankit810248bgre'  # Replace with your username
database = 'default'        # Use 'default' or specify your database name

# Establish a connection
conn = hive.Connection(host=host, port=port, username=username, database=database)

# Create a cursor object
cursor = conn.cursor()

# Execute a Hive query
cursor.execute("SHOW TABLES")

# Fetch results
results = cursor.fetchall()

# Print the results
for row in results:
    print(row)

# Close the cursor and connection
cursor.close()
conn.close()


In [None]:
from sqlalchemy import create_engine
import pandas as pd

# Define the Hive connection parameters
username = 'ankit810248bgre'  # Replace with your Hive username
host = 'localhost'            # Use 'localhost' if Hive is running locally
port = 10000                  # Default port for HiveServer2
database = 'default'          # Replace with your Hive database name

# Create the connection string for SQLAlchemy
connection_string = f'hive://{username}@{host}:{port}/{database}'

# Create a SQLAlchemy engine
engine = create_engine(connection_string)

In [None]:
CREATE TABLE your_table_name (
    id INT,
    name STRING
)
PARTITIONED BY (date STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
