# 📊 Ma'lumotlar Turlari va Xususiyatlari

**Maqsad:** Data Science'da ishlatiladigan har xil ma'lumot turlarini o'rganish

---

## 🔢 1. Sonli Ma'lumotlar (Numerical Data)

### Xususiyatlari:
- Raqamlar bilan ifodalanadi
- Matematik operatsiyalar qilish mumkin
- Statistik tahlil qilish oson

### Turlari:

#### **1.1 Continuous (Uzluksiz):**
- Har qanday qiymat qabul qilishi mumkin
- O'lchov birligi bor
- **Misollar:** bo'y, vazn, harorat, maosh

#### **1.2 Discrete (Diskret):**
- Faqat butun sonlar
- Sanash mumkin
- **Misollar:** bolalar soni, kitoblar soni, xatolar soni

In [None]:
# Sonli ma'lumotlar misoli
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Continuous data - talabalar bo'yi (sm)
heights = [165.5, 170.2, 158.7, 175.9, 162.3, 168.1, 173.4, 160.8]

# Discrete data - talabalar yoshi
ages = [20, 19, 21, 22, 20, 19, 23, 21]

# DataFrame yaratish
students_df = pd.DataFrame({
    'height': heights,
    'age': ages
})

print("📊 SONLI MA'LUMOTLAR")
print("=" * 20)
print(students_df)

print("\n📈 Statistik ma'lumotlar:")
print(students_df.describe())

## 🏷️ 2. Kategoriyaviy Ma'lumotlar (Categorical Data)

### Xususiyatlari:
- Turli kategoriyalarga bo'linadi
- Matematik operatsiyalar cheklangan
- Counting va grouping uchun juda foydali

### Turlari:

#### **2.1 Nominal (Nomli):**
- Kategoriyalar orasida tartib yo'q
- Faqat farqlash uchun
- **Misollar:** jins, rang, mamlakat, kasb

#### **2.2 Ordinal (Tartibli):**
- Kategoriyalar orasida tartib bor
- Ranking mavjud
- **Misollar:** ta'lim darajasi, baho, satisfaction level

In [None]:
# Kategoriyaviy ma'lumotlar misoli

# Nominal data
genders = ['Ayol', 'Erkak', 'Ayol', 'Ayol', 'Erkak', 'Ayol', 'Erkak', 'Ayol']
majors = ['IT', 'Tibbiyot', 'IT', 'Iqtisod', 'IT', 'Tibbiyot', 'Iqtisod', 'IT']

# Ordinal data
education_levels = ['Bakalavr', 'Magistr', 'Bakalavr', 'Doktor', 'Bakalavr', 'Magistr', 'Bakalavr', 'Magistr']
satisfaction = ['Yaxshi', 'Juda yaxshi', 'O\'rta', 'Yaxshi', 'Juda yaxshi', 'Yaxshi', 'O\'rta', 'Juda yaxshi']

# DataFrame ga qo'shish
students_df['gender'] = genders
students_df['major'] = majors
students_df['education'] = education_levels
students_df['satisfaction'] = satisfaction

print("🏷️ KATEGORIYAVIY MA'LUMOTLAR")
print("=" * 30)
print(students_df)

print("\n📊 Kategoriyalar bo'yicha taqsimot:")
print("\nJins bo'yicha:")
print(students_df['gender'].value_counts())

print("\nMutaxassislik bo'yicha:")
print(students_df['major'].value_counts())

## 📝 3. Matnli Ma'lumotlar (Text Data)

### Xususiyatlari:
- So'zlar, jumlalar, paragraflar
- Natural Language Processing (NLP) kerak
- Preprocessing ko'p talab qiladi

### Turlari:
- **Structured text:** Email, SMS, social media posts
- **Unstructured text:** Kitoblar, maqolalar, bloglar
- **Semi-structured:** HTML, XML fayllar

In [None]:
# Matnli ma'lumotlar misoli

# Talabalar sharhlari
comments = [
    "Bu kurs juda qiziq va foydali. Python o'rganish qiyin emas.",
    "Data Science haqida ko'p narsa o'rgandim. Rahmat!",
    "Darslar aniq va tushunarli. Davom eting!",
    "AI sohasida ishlashni xohlayman. Bu kurs yordam berdi.",
    "Jupyter Notebook juda qulay. Tavsiya qilaman.",
    "O'qituvchi professional. Ma'ruzalar sifatli.",
    "Amaliy mashqlar yetarli. Ko'proq practice kerak.",
    "Python kutubxonalari haqida yangi bilimlar oldim."
]

students_df['comments'] = comments

print("📝 MATNLI MA'LUMOTLAR")
print("=" * 25)

# Text analysis - so'zlar uzunligi
students_df['comment_length'] = students_df['comments'].str.len()
students_df['word_count'] = students_df['comments'].str.split().str.len()

print("Sharh statistikasi:")
print(f"O'rtacha sharh uzunligi: {students_df['comment_length'].mean():.1f} belgi")
print(f"O'rtacha so'zlar soni: {students_df['word_count'].mean():.1f} so'z")

# Eng ko'p ishlatiladigan so'zlar
all_text = ' '.join(comments).lower()
words = all_text.split()
from collections import Counter
common_words = Counter(words).most_common(5)

print("\nEng ko'p ishlatiladigan so'zlar:")
for word, count in common_words:
    print(f"'{word}': {count} marta")

## 🖼️ 4. Tasvirli Ma'lumotlar (Image Data)

### Xususiyatlari:
- Pixel qiymatlari (0-255)
- 2D yoki 3D arrays (RGB)
- Computer Vision algoritmlar kerak
- Juda katta hajmli

### Formatlar:
- **JPEG, PNG:** Umumiy formatlar
- **TIFF, RAW:** Yuqori sifatli
- **DICOM:** Tibbiy tasvirlar
- **Video:** Tasvirlar ketma-ketligi

In [None]:
# Tasvirli ma'lumotlar misoli (synthetic data)
import matplotlib.pyplot as plt
import numpy as np

# Oddiy rasm yaratish (64x64 pixel)
image_size = 64

# Random rasm
random_image = np.random.randint(0, 256, (image_size, image_size, 3))

# Gradient rasm
x = np.linspace(0, 1, image_size)
y = np.linspace(0, 1, image_size)
X, Y = np.meshgrid(x, y)
gradient_image = np.stack([X, Y, 0.5 * np.ones_like(X)], axis=2)

# Circle rasm
center = image_size // 2
y, x = np.ogrid[:image_size, :image_size]
mask = (x - center) ** 2 + (y - center) ** 2 <= (image_size // 4) ** 2
circle_image = np.zeros((image_size, image_size, 3))
circle_image[mask] = [1, 0, 0]  # Qizil doira

print("🖼️ TASVIRLI MA'LUMOTLAR")
print("=" * 25)
print(f"Rasm o'lchami: {image_size}x{image_size} pixel")
print(f"Rang kanallari: 3 (RGB)")
print(f"Ma'lumot hajmi: {image_size * image_size * 3} qiymat")

# Rasmlarni ko'rsatish
fig, axes = plt.subplots(1, 3, figsize=(12, 4))

axes[0].imshow(random_image.astype(np.uint8))
axes[0].set_title('Random Rasm')
axes[0].axis('off')

axes[1].imshow(gradient_image)
axes[1].set_title('Gradient Rasm')
axes[1].axis('off')

axes[2].imshow(circle_image)
axes[2].set_title('Geometrik Shakl')
axes[2].axis('off')

plt.tight_layout()
plt.show()

# Rasm statistikasi
print(f"\nRandom rasm:")
print(f"  Min qiymat: {random_image.min()}")
print(f"  Max qiymat: {random_image.max()}")
print(f"  O'rtacha: {random_image.mean():.1f}")

## ⏰ 5. Vaqt Seriyali Ma'lumotlar (Time Series Data)

### Xususiyatlari:
- Vaqt bo'yicha tartibli
- Trend va seasonality mavjud
- Forecasting uchun ishlatiladi
- Temporal dependency bor

### Misollar:
- **Stock prices:** Aksiyalar narxi
- **Weather data:** Harorat, yomg'ir
- **Sales data:** Kunlik/oylik savdo
- **Sensor data:** IoT qurilmalar ma'lumoti

In [None]:
# Vaqt seriyali ma'lumotlar misoli
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# 30 kunlik ma'lumot yaratish
start_date = datetime(2024, 1, 1)
dates = [start_date + timedelta(days=i) for i in range(30)]

# Synthetic temperature data (Toshkent)
base_temp = 15  # O'rtacha qish harorati
trend = np.linspace(0, 5, 30)  # Bahorda isish
seasonal = 5 * np.sin(np.linspace(0, 4*np.pi, 30))  # Kunlik o'zgarish
noise = np.random.normal(0, 2, 30)  # Random o'zgarish
temperature = base_temp + trend + seasonal + noise

# Sales data (kunlik savdo)
base_sales = 1000
weekend_boost = [200 if date.weekday() >= 5 else 0 for date in dates]
sales_noise = np.random.normal(0, 100, 30)
sales = base_sales + weekend_boost + sales_noise

# DataFrame yaratish
timeseries_df = pd.DataFrame({
    'date': dates,
    'temperature': temperature,
    'sales': sales
})

print("⏰ VAQT SERIYALI MA'LUMOTLAR")
print("=" * 35)
print(timeseries_df.head())

# Vaqt seriyali tahlil
print(f"\nVaqt oralig'i: {dates[0].strftime('%Y-%m-%d')} dan {dates[-1].strftime('%Y-%m-%d')} gacha")
print(f"Kunlar soni: {len(dates)}")
print(f"O'rtacha harorat: {temperature.mean():.1f}°C")
print(f"O'rtacha savdo: {sales.mean():.0f} so'm")

# Vizualizatsiya
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

# Harorat grafigi
ax1.plot(dates, temperature, 'b-', linewidth=2, label='Harorat')
ax1.set_title('Kunlik Harorat O\'zgarishi')
ax1.set_ylabel('Harorat (°C)')
ax1.grid(True, alpha=0.3)
ax1.legend()

# Savdo grafigi
ax2.plot(dates, sales, 'g-', linewidth=2, label='Savdo', marker='o', markersize=4)
ax2.set_title('Kunlik Savdo Hajmi')
ax2.set_ylabel('Savdo (so\'m)')
ax2.set_xlabel('Sana')
ax2.grid(True, alpha=0.3)
ax2.legend()

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 🔄 6. Ma'lumot Turlarini Aniqlash va O'zgartirish

### Pandas'da ma'lumot turlari:

In [None]:
# Barcha ma'lumotlarni bitta DataFrame'ga yig'ish
final_df = pd.DataFrame({
    'student_id': range(1, 9),
    'height': heights,
    'age': ages,
    'gender': genders,
    'major': majors,
    'satisfaction': satisfaction,
    'comment_length': students_df['comment_length'],
    'registration_date': pd.date_range('2024-01-01', periods=8, freq='D')
})

print("🔍 MA'LUMOT TURLARI TAHLILI")
print("=" * 30)

# Data types ko'rsatish
print("DataFrame ma'lumot turlari:")
print(final_df.dtypes)

print("\nHar bir ustun haqida ma'lumot:")
print(final_df.info())

# Kategoriyaviy ustunlarni category type'ga o'zgartirish
final_df['gender'] = final_df['gender'].astype('category')
final_df['major'] = final_df['major'].astype('category')

# Satisfaction ni ordinal qilish
satisfaction_order = ['O\'rta', 'Yaxshi', 'Juda yaxshi']
final_df['satisfaction'] = pd.Categorical(
    final_df['satisfaction'], 
    categories=satisfaction_order, 
    ordered=True
)

print("\nO'zgartirilgandan keyin:")
print(final_df.dtypes)

# Memory usage ko'rsatish
print("\nXotira ishlatilishi:")
print(final_df.memory_usage(deep=True))

## 📊 7. Ma'lumot Turlariga Mos Tahlil Usullari

### Har bir tur uchun mos statistik usullar:

In [None]:
print("📊 MA'LUMOT TURLARIGA MOS TAHLIL")
print("=" * 35)

# Sonli ma'lumotlar uchun
print("🔢 SONLI MA'LUMOTLAR TAHLILI:")
print("-" * 30)
print("Bo'y statistikasi:")
print(f"  O'rtacha: {final_df['height'].mean():.1f} sm")
print(f"  Mediana: {final_df['height'].median():.1f} sm")
print(f"  Standart devatsiya: {final_df['height'].std():.1f} sm")
print(f"  Min-Max: {final_df['height'].min():.1f} - {final_df['height'].max():.1f} sm")

# Kategoriyaviy ma'lumotlar uchun
print("\n🏷️ KATEGORIYAVIY MA'LUMOTLAR TAHLILI:")
print("-" * 35)
print("Jins taqsimoti:")
gender_counts = final_df['gender'].value_counts()
for gender, count in gender_counts.items():
    percentage = (count / len(final_df)) * 100
    print(f"  {gender}: {count} ({percentage:.1f}%)")

print("\nMutaxassislik taqsimoti:")
major_counts = final_df['major'].value_counts()
for major, count in major_counts.items():
    percentage = (count / len(final_df)) * 100
    print(f"  {major}: {count} ({percentage:.1f}%)")

# Ordinal ma'lumotlar uchun
print("\n📈 ORDINAL MA'LUMOTLAR TAHLILI:")
print("-" * 30)
print("Satisfaction level taqsimoti:")
satisfaction_counts = final_df['satisfaction'].value_counts()
for level, count in satisfaction_counts.items():
    percentage = (count / len(final_df)) * 100
    print(f"  {level}: {count} ({percentage:.1f}%)")

# Vaqt ma'lumotlari uchun
print("\n⏰ VAQT MA'LUMOTLARI TAHLILI:")
print("-" * 25)
print(f"Ro'yxatdan o'tish davri: {final_df['registration_date'].min().strftime('%Y-%m-%d')} dan {final_df['registration_date'].max().strftime('%Y-%m-%d')} gacha")
print(f"Jami davr: {(final_df['registration_date'].max() - final_df['registration_date'].min()).days + 1} kun")

## ✅ Xulosa va Tavsiyalar

### 🎯 Ma'lumot turlarini to'g'ri aniqlash muhim:

1. **Sonli ma'lumotlar:** Mean, median, correlation analiz
2. **Kategoriyaviy:** Frequency analysis, chi-square test
3. **Matnli:** NLP preprocessing, sentiment analysis
4. **Tasvirli:** Computer vision, CNN models
5. **Vaqt seriyali:** Trend analysis, forecasting

### 💡 Keyingi qadamlar:
- Ma'lumot tozalash usullarini o'rganish
- Exploratory Data Analysis (EDA)
- Visualization techniques
- Machine Learning uchun feature engineering

---

**📝 Mashq:** Quyidagi vazifalarni bajaring:
1. O'zingizning dataset'ingizda ma'lumot turlarini aniqlang
2. Har bir tur uchun mos statistik tahlil bajaring
3. Ma'lumot sifatini baholang
4. Keyingi tahlil uchun tayyorlang