<a href="https://colab.research.google.com/github/AINERD007/AINERD007/blob/main/Working_with%C2%A0data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Exploring One-Dimensional Data:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load historical stock prices from a CSV file
data = pd.read_csv('stock_prices.csv')

# Summarize the data
mean_price = data['Close'].mean()
min_price = data['Close'].min()
max_price = data['Close'].max()

print(f"Mean Price: {mean_price}")
print(f"Minimum Price: {min_price}")
print(f"Maximum Price: {max_price}")

# Visualize the data
plt.plot(data['Date'], data['Close'])
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.title('Historical Stock Prices')
plt.xticks(rotation=45)
plt.show()

# Identify outliers
outliers = data[data['Close'] > (mean_price + 3 * data['Close'].std())]
print("Outliers:")
print(outliers)

# Analyze trends
rolling_mean = data['Close'].rolling(window=30).mean()
plt.plot(data['Date'], data['Close'], label='Stock Price')
plt.plot(data['Date'], rolling_mean, label='30-day Moving Average')
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.title('Stock Price and 30-day Moving Average')
plt.xticks(rotation=45)
plt.legend()
plt.show()


The outcome of this code will be a series of visualizations and summary statistics related to historical stock price data. Specifically, the code will:

Print the mean, minimum, and maximum closing prices of the stock, providing a sense of the overall price range and average value.
Display a line plot showing the stock prices over time, allowing us to visualize the historical price trends and fluctuations. The x-axis will represent dates, and the y-axis will represent the stock prices.
Identify and print any outlier data points, which are prices that significantly deviate from the mean. Outliers may indicate abnormal price movements or events in the stock market.
Display another line plot overlaying the 30-day moving average of the stock prices on top of the original stock price plot. The moving average helps identify long-term trends and smooths out short-term fluctuations, providing insights into the stock's overall price direction.
The outcome of this code will be a comprehensive exploration of the historical stock price data, giving us a better understanding of the stock's behavior, identifying any unusual price movements, and providing insights into potential trends in the stock's price movements over time.

**Exploring Two-Dimensional Data:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load housing price data from a CSV file
data = pd.read_csv('housing_prices.csv')

# Summarize the data
mean_price = data['Price'].mean()
correlation = data['Price'].corr(data['Area'])

print(f"Mean Price: {mean_price}")
print(f"Correlation between Price and Area: {correlation}")

# Visualize the data
plt.scatter(data['Area'], data['Price'])
plt.xlabel('Area (in sq. ft.)')
plt.ylabel('Price (in USD)')
plt.title('Housing Prices vs. Area')
plt.show()

# Identify outliers
outliers = data[data['Price'] > (mean_price + 3 * data['Price'].std())]
print("Outliers:")
print(outliers)

# Regression analysis
from scipy import stats

slope, intercept, r_value, p_value, std_err = stats.linregress(data['Area'], data['Price'])
print(f"Slope: {slope}, Intercept: {intercept}, R-value: {r_value}, P-value: {p_value}, Std Err: {std_err}")


The code will display the mean price and correlation coefficient between housing prices and area.
It will show a scatter plot with data points, revealing the distribution of housing prices concerning their areas.
The code will print any outlier data points that significantly differ from the mean price.
Lastly, the code will provide regression analysis statistics, such as slope and intercept of the regression line, R-value indicating the strength of the relationship, p-value indicating significance, and standard error. These statistics help understand the linear relationship between housing prices and area and determine whether it is statistically significant.

**Exploring High-Dimensional Data:**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load customer data from a CSV file
data = pd.read_csv('customer_data.csv')

# Data Preprocessing
data.dropna(inplace=True)  # Remove rows with missing values
features = data.iloc[:, 1:]  # Extract feature columns

# Standardize the features
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

# K-Means Clustering
num_clusters = 5
kmeans = KMeans(n_clusters=num_clusters)
data['Cluster'] = kmeans.fit_predict(scaled_features)

# Visualization using PCA for dimensionality reduction
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca_features = pca.fit_transform(scaled_features)
data['PCA1'] = pca_features[:, 0]
data['PCA2'] = pca_features[:, 1]

# Plot the clusters using PCA
plt.scatter(data['PCA1'], data['PCA2'], c=data['Cluster'], cmap='viridis')
plt.xlabel('PCA1')
plt.ylabel('PCA2')
plt.title('Customer Segmentation')
plt.show()


The code will visualize customer segments in a 2D plot using PCA components (PCA1 and PCA2).
Each data point on the plot will represent a customer, and its color will indicate the cluster it belongs to.
By analyzing the plot, we can observe how customers are clustered based on their feature similarity. The plot should show distinct clusters if customer segmentation is successful.
The number of clusters can be adjusted by changing the 'num_clusters' variable to explore different customer segmentation scenarios.