# Python in Data Science

Python is widely used in data science due to its rich ecosystem of libraries and tools. Here are the main aspects of using Python for data science:

### 1. Data Manipulation with Pandas



In [None]:
# Import required libraries
import pandas as pd
import numpy as np

# Create a sample dataset
data = {
    'name': ['John', 'Alice', 'Bob'],
    'age': [28, 24, 32],
    'salary': [50000, 45000, 70000]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Basic data operations
print(df.head())           # View first few rows
print(df.describe())       # Statistical summary
print(df['age'].mean())    # Calculate mean age

# Filter data
high_salary = df[df['salary'] > 50000]

# Group by operations
avg_salary_by_age = df.groupby('age')['salary'].mean()



### 2. Numerical Computing with NumPy



In [None]:
# Import NumPy
import numpy as np

# Create arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2, 3], [4, 5, 6]])

# Array operations
print(arr.mean())          # Calculate mean
print(arr.std())           # Calculate standard deviation
print(matrix.shape)        # Get dimensions

# Mathematical operations
squared = np.square(arr)   # Square each element
sqrt = np.sqrt(arr)        # Square root of each element



### 3. Data Visualization with Matplotlib and Seaborn



In [None]:
# Import visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Create sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Basic line plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, label='Sine Wave')
plt.title('Simple Line Plot')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.legend()
plt.show()

# Seaborn visualization
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip")
plt.show()



### 4. Machine Learning with Scikit-learn



In [None]:
# Import required libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create sample dataset
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")



### 5. Deep Learning with TensorFlow/Keras



In [None]:
# Import required libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Create a simple neural network
model = Sequential([
    Dense(64, activation='relu', input_shape=(10,)),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Generate sample data
X = np.random.random((1000, 10))
y = np.random.randint(2, size=(1000, 1))

# Train the model
history = model.fit(
    X, y,
    epochs=10,
    batch_size=32,
    validation_split=0.2
)



### 6. Data Preprocessing



In [None]:
# Import required libraries
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

# Create sample data with missing values
data = np.array([[1, 2], [np.nan, 3], [7, 6]])

# Handle missing values
imputer = SimpleImputer(strategy='mean')
data_clean = imputer.fit_transform(data)

# Scale the data
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data_clean)



### Key Libraries for Data Science:

1. **Data Manipulation**:
   - pandas: Data analysis and manipulation
   - numpy: Numerical computing

2. **Visualization**:
   - matplotlib: Basic plotting library
   - seaborn: Statistical data visualization
   - plotly: Interactive visualizations

3. **Machine Learning**:
   - scikit-learn: Traditional machine learning
   - tensorflow: Deep learning
   - keras: High-level neural networks API

4. **Statistical Analysis**:
   - scipy: Scientific computing
   - statsmodels: Statistical models

Remember to:
- Always explore and clean your data first
- Choose appropriate visualization methods
- Use proper train-test splits
- Validate your models
- Document your analysis
- Use version control for your code

These examples demonstrate basic usage of Python's data science libraries. Each library has many more features and capabilities for specific use cases.

Similar code found with 1 license type