## Install and Configure Hive


### **1. Install Hive on Hadoop Environment**
<br>

**Link to download:**  https://archive.apache.org/dist/hive/hive-2.3.9/apache-hive-2.3.9-bin.tar.gz

<br>

**Prerequisites:**

* Hadoop must be running (`start-dfs.cmd`, `start-yarn.cmd`).
* **Derby Database** (default embedded database for Hive metadata) is required.

**Step A: Download and Setup**

1. Download **Apache Hive binaries** (e.g., version 3.1.2) from the official website.
2. Extract to `C:\hive` (or `/usr/local/hive` on Linux).
3. Set **Environment Variables**:
* `HIVE_HOME` = `C:\hive`
* Add `%HIVE_HOME%\bin` to system **Path**.



**Step B: Configure hive-site.xml**
Go to `C:\hive\conf` and create/edit `hive-site.xml`. Add the following to configure the metastore (where Hive stores table definitions):

```xml
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
</configuration>

```

**Step C: Create HDFS Folders**
Hive needs a warehouse directory in HDFS.

```bash
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /user/hive/warehouse
hdfs dfs -mkdir -p /tmp
hdfs dfs -chmod g+w /tmp

```

**Step D: Initialize Schema & Start Hive**
Run this **once** to create the metastore database:

```cmd
schematool -dbType derby -initSchema

```

Start the Hive Shell:

```cmd
hive

```

---

### **2. Load a Dataset into Hive**

**Step A: Create Local Data**
Create a file named `students.txt` on your computer:

```text
101,John,IT
102,Jane,CS
103,Bob,Civil
104,Alice,IT

```

**Step B: Create Table**
In the Hive shell (`hive>`), run:

```sql
CREATE TABLE students (
    id INT,
    name STRING,
    dept STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

```

**Step C: Load Data**
Load the file from your local system into the Hive table.

```sql
LOAD DATA LOCAL INPATH 'C:/students.txt' INTO TABLE students;

```

---

### **3. Run Basic HiveQL Queries**

**Select (Read Data)**
Check if data loaded correctly.

```sql
SELECT * FROM students;

```

**Filter Data**
Retrieve specific records.

```sql
SELECT name FROM students WHERE dept = 'IT';

```

**Insert (Add Data)**
Insert a record manually (Note: This launches a MapReduce job, so it may be slow).

```sql
INSERT INTO TABLE students VALUES (105, 'Mike', 'Mech');

```

**Delete (Clear Data)**
Hive cannot delete individual rows in standard text tables easily, but you can clear the table or drop it.

```sql
-- Remove all data (Truncate)
TRUNCATE TABLE students;

-- Delete the table structure entirely
DROP TABLE students;

```

---
