## Importance of data in AI development
Data is the backbone of AI, whether it's for training ML models, validating algorithms or depoloying AI systems. The quality and security of the data that the developers use directly impacts the AI solutions performance and reliability. However, with great power comes great responisibilites, the sensitive nature of the data used makes it a prime target for malicious avtivities.

### Key points
#### 1. Training data
AI models learn from the data which they are trained on. If this data is compormised it can lead to biased, inaccurate or even harmful outputs.
#### 2. Inference Data
During deployment, AI systems process new data inputs to generate predictions or decisions. Ensuring the integrity of this data is crucial for maintaining trust in the AI systems.
#### 3. Data sensitivty
It's crucial to protect it such as ***Personal Identifiable Information*** to avoid severe legal and financial consequences.

### Key risks
#### 1. Data Breaches 
Unauthorized access can lead to private data exposure
#### 2. Data poisonning 
Introducing false or misleading data causing AI to learn incorrect patterns

### Key Practices
#### 1. Data Encryption
Encrypting the data both at rest and in transit ensures that even if intercepted it cannot be read without the right encryption key.
***Implementation*** : Use strong encryption standards such as ***AES256*** for data storage and secure socket layer/transport layer security for data transmission to protect against unauthorized access

#### 2. Access control
Implementation of role-based access control and ***Multifactor authentification***.

#### 3. Data annonymization and regular audits

## Practice activity of auditing a ML system
First revewing the code and identifying security vulnerabilities such as : 
- ***Data Validation and sanitization***: Is the input data validated or sanitized to prevent malicious input?
- ***Input Validation***: Are there any checks on the input data to make sure it mets expected formats or values?
- ***Random state and seed management***: Is the random state used securely to prevent model predictibility.
- ***Model security***: Are there security measures to protect the model from tampering or unauthorized access
- ***Encryption***:  Is sensitive data encrypted when transmitted or stored
- ***Integrity checks***: Are there mechanisms to verify the integrity of the model when loading it from storage

#### Review of flawed code block

```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pickle

# Load Dataset (Flaw : No data validation or sanitization)
data = pd.read_csv('user_data.csv')

# Solution, Validate input data 
def validate_data(df):
    if df.isnull()values.all():
        raise ValueError('Dataset contains null values, please clean data before processing')
        # Additional validation checks can be added
    return df

# load validated data
data = validate_data(pd.read_csv('user_data.csv'))


# Split data into features and targets (Flaw : No input validation)
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Solution, validate the inputs
X = validate_data(data.iloc[:, :-1])
Y = validate_data(data.iloc[:, -1])

# Split the data into training and testing sets (Flaw : Fixed random state)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 46)

# Solution, Using a securely generated random state
import os
X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size = 0.2, random_state = os.urandom(16))

# Train a simple logistic regrssion (Flaw : No model security check)
model = LogisticRegression()
model.fit(X_train, y_train)

# Save the model to disk (Flaw : No encrypted model saving)
filename = 'Finalized_model-sav'
pickle.dump(model, open(filename, 'wb'))

# Solution, Encrypt the model before saving
import cryptography.fernet
key = cryptography.fernet.Fernet.generate_key()
cipher = cryptography.fernet.Fernet(key)
# Save the encrypted model to disk
filename = 'Finalized_model.sav'
encrypted_model = cipher.encrypt(pickle.dumps(model))
with open(filename, 'wb') as f : 
    f.write(encrypted_model)


# Load the model from disl for later use (Flaw : No integrity check)
Loaded_model = pickle.load(open(filename, 'rb'))
result = Loaded_model.scor(X_test, y_test)

# Solution, Load the model and verify it's integrity
import hashlib
with open(filename, 'rb') as f :
    encrypted_model = f.read()
    decrypted_model = cipher.decrypt(encrypted_model)
loaded_model = pickle.loads(decrypted_model)

# Compute the hash of the loaded model and compare it to the original model
Loaded_model_hash = hashlib.sha256(pickle.dumps(decrypted_model)).hexdigest()
Original_model_hash = hashlib.sha256(pickle.dumps(model)).hexdigest()
if Loaded_model_hash != Original_model_hash : 
    raise ValueError("Model intefrity check failed")
```