#### About

> Data Ops

DataOps is an approach to managing the entire data lifecycle, from data ingestion and processing to analysis and implementation, using Agile and DevOps principles. It involves collaboration between data engineers, data scientists, and business stakeholders to quickly and efficiently develop, test, and deploy data-driven applications. 

At its core, DataOps focuses on automating and simplifying the processes involved in building and deploying data-driven applications. This includes automated testing and deployment, continuous integration and delivery (CI/CD), version control and data quality monitoring. By implementing DataOps, organizations can increase the speed and accuracy of data operations, improve data quality, and reduce the time and cost of data-driven projects.

For example, data teams can use DataOps principles to create pipelines for receiving and processing customer data from a variety of sources, such as social media and online purchases. This pipeline can include automated data quality checks, data enrichment and feature development, as well as automated testing and deployment to production.



In [1]:
# using data ops to load data from a csv file to postgresql database

In [3]:
import pandas as pd
import psycopg2


In [4]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data


--2023-05-02 08:09:25--  https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4551 (4.4K) [application/x-httpd-php]
Saving to: ‘iris.data’


2023-05-02 08:09:26 (22.4 MB/s) - ‘iris.data’ saved [4551/4551]



In [5]:
# Define input data schema
input_schema = {
    'sepal_length': float,
    'sepal_width': float,
    'petal_length': float,
    'petal_width': float,
    'class': str
}


In [6]:
# Load data from CSV file into a Pandas dataframe
data = pd.read_csv('iris.data', dtype=input_schema)


In [10]:
# Define new column names
new_columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']


In [11]:
# Rename the columns of the DataFrame
data.columns = new_columns


In [12]:
data.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

In [13]:
# Clean and transform data as needed
data['sepal_area'] = data['sepal_length'] * data['sepal_width']


In [None]:
# Connect to PostgreSQL database
conn = psycopg2.connect(
    host="localhost",
    database="mydatabase",
    user="myusername",
    password="mypassword"
)


In [None]:
# Create table in PostgreSQL database
with conn.cursor() as cur:
    cur.execute("""
        CREATE TABLE iris (
            sepal_length FLOAT,
            sepal_width FLOAT,
            petal_length FLOAT,
            petal_width FLOAT,
            class VARCHAR(50),
            sepal_area FLOAT
        );
    """)
    conn.commit()


In [None]:
# Load data into PostgreSQL database
with conn.cursor() as cur:
    for _, row in data.iterrows():
        cur.execute("""
            INSERT INTO iris (sepal_length, sepal_width, petal_length, petal_width, class, sepal_area)
            VALUES (%s, %s, %s, %s, %s, %s);
        """, (row['sepal_length'], row['sepal_width'], row['petal_length'], row['petal_width'], row['class'], row['sepal_area']))
    conn.commit()

# Close connection to PostgreSQL database
conn.close()